Automating the process of narrowing down site traffic issues with Python gives you the opportunity to help your clients recover fast.
This is the second part of a three-part series. In part one[1], I introduced our approach to nail down the pages losing traffic. We call it the “winners vs losers” analysis. If you have a big site, reviewing individual pages losing traffic as we did on part one might not give you a good sense of what the problem is. So, in part two we will create manual page groups using regular expressions. If you stick around to read part three, I will show you how to group pages automatically using machine learning.
You can find the code used in part one, two and three in this Google Colab notebook[2]. Let’s walk over part two and learn some Python.
Incorporating redirects
As the site our analyzing moved from one platform to another, the URLs changed, and a decent number of redirects were put in place. In order to track winners and losers more accurately, we want to follow the redirects from the first set of pages. We were not really comparing apples to apples in part one. If we want to get a fully accurate look at the winners and losers, we’ll have to try to discover where the source pages are redirecting to, then repeat the comparison.
1. Python requests
We’ll use the requests library[3] which simplifies web scraping, to send an HTTP HEAD[4] request to each URL in our Google Analytics data set, and if it returns a 3xx redirect, we’ll record the ultimate destination and re-run our winners and losers