7

I'm using Pytrends to extract Google trends data, like:

from pytrends.request import TrendReq
pytrend = TrendReq()
pytrend.build_payload(kw_list=['bitcoin'], cat=0, timeframe=from_date+' '+today_date)

And it returns an error:

ResponseError: The request failed: Google returned a response with code 429.

I made it yesterday and for some reason it doesn't work now! The source code from github failed too:

pytrends = TrendReq(hl='en-US', tz=360, proxies = {'https': 'https://34.203.233.13:80'})

How can I fix this? Thanks a lot!

WWH98932
  • 123
  • 1
  • 1
  • 9
  • Google will block your ip pretty fast if they suspect you spamming or scraping or any other kind of abuse of the system – Druta Ruslan May 28 '18 at 18:16
  • 1
    @zimdero Thanks, what can I do if I still wanna use it? The "proxies" command doesn't work. – WWH98932 May 28 '18 at 18:20
  • I am actively trying to solve this same exact problem. Im doing research for an Academic paper and it was working yesterday and today I havent been able to get one successful response. Looking forward to finding a good solution... – lopezdp May 28 '18 at 23:09
  • @WWH98932 What are you working on? I am making the same exact search. Im thinking we're going to have to download the csv file from the web interface and then bring it in to JupyterLab and into a DataFrame. That's the approach I am leaning towards anyway. – lopezdp May 28 '18 at 23:37
  • @lopezdp Same, I'm doing it manually, it's really annoying. I'll go somewhere else tomorrow to see if it can change my IP address... – WWH98932 May 28 '18 at 23:53
  • To me, it seems like it has blocked any call from the script – peeyush singla May 29 '18 at 11:34
  • Maybe using time.sleep() you can get results. – keramat Feb 19 '20 at 14:09
  • anyone found a solution to this? when I run my code on my computer it goes ok, but when I do that on my prod environment (on AWS) I'm getting this 429 – Renato Bibiano Sep 13 '21 at 18:04

5 Answers5

12

TLDR; I solved the problem with a custom patch

Explanation

The problem comes from the Google bot recognition system. As other similar systems do, it stops serving too frequent requests coming from suspicious clients. Some of the features used to recognize trustworthy clients are the presence of specific headers generated by the javascript code present on the web pages. Unfortunately, the python requests library does not provide such a level of camouflage against those bot recognition systems since javascript code is not even executed. So the idea behind my patch is to leverage the headers generated by my browser interacting with google trends. Those headers are generated by the browser meanwhile I am logged in using my Google account, in other words, those headers are linked with my google account, so for them, I am trustworthy.

Solution

I solved in the following way:

  1. First of all you must use google trends from your web browser while you are logged in with your Google Account;
  2. In order to track the actual HTTP GET made: (I am using Chromium) Go into "More Tools" -> "Developers Tools" -> "Network" tab.
  3. Visit the Google Trend page and perform a search for a trend; it will trigger a lot of HTTP requests on the left sidebar of the "Network" tab;
  4. Identify the GET request (in my case it was /trends/explore?q=topic&geo=US) and right-click on it and select Copy -> Copy as cURL;
  5. Then go to this page and paste the cURL script on the left side and copy the "headers" dictionary you can find inside the python script generated on the right side of the page;
  6. Then go to your code and subclass the TrendReq class, so you can pass the custom header just copied:
from pytrends.request import TrendReq as UTrendReq
GET_METHOD='get'

import requests

headers = {
...
}


class TrendReq(UTrendReq):
    def _get_data(self, url, method=GET_METHOD, trim_chars=0, **kwargs):
        return super()._get_data(url, method=GET_METHOD, trim_chars=trim_chars, headers=headers, **kwargs)

  1. Remove any "import TrendReq" from your code since now it will use this you just created;
  2. Retry again;
  3. If in any future the error message comes back: repeat the procedure. You need to update the header dictionary with fresh values and it may trigger the captcha mechanism.
  • Is it possible to use the requests_args argument at TrendReq to add the headers? Or do you need to edit the source? – GregOliveira Oct 18 '22 at 08:49
  • 1
    As far as I remember, is not possible to pass the header as argument of TrendReq's constructor. So, I applied this custom patch by overriding the _get_data method. Yes, I would need to edit the source, having the header passed in the init or directly during the actual call. – Antonio Ercole De Luca Oct 18 '22 at 14:46
  • Looks like Google is blocking by IP. Because I did this headers change, but the blocking (HTTPS error 429) remained. – bl79 Jan 08 '23 at 06:12
  • PyTrends now has an option to define headers. https://github.com/GeneralMills/pytrends/pull/553 – bl79 Jan 08 '23 at 06:23
9

This one took a while but it turned out the library just needed an update. You can check out a few of the approaches I posted here, both of which resulted in Status 429 Responses:

https://github.com/GeneralMills/pytrends/issues/243

Ultimately, I was able to get it working again by running the following command from my bash prompt:

Run:

pip install --upgrade --user git+https://github.com/GeneralMills/pytrends

For the latest version.

Hope that works for you too.

EDIT:

If you can't upgrade from source you may have some luck with:

pip install pytrends --upgrade

Also, make sure you're running git as an administrator if on Windows.

lopezdp
  • 1,538
  • 3
  • 21
  • 38
  • @WWH98932 did you try restarting the kernal? – lopezdp Jun 07 '18 at 00:00
  • It returned the following error when I tried to upgrade it using your command: – WWH98932 Jun 07 '18 at 03:46
  • Cannot find command 'git' - do you have 'git' installed and in your PATH? – WWH98932 Jun 07 '18 at 03:47
  • 1
    umm yes. This works for me. It sounds like you do not have git installed or your PATH variable is not set to accept the git command. You should have git + python3 installed at least that's my environment. I upgraded directly from their source. – lopezdp Jun 07 '18 at 04:13
  • @WWH98932 see the EDIT I added to the answer and see if that helps – lopezdp Jun 07 '18 at 04:18
  • @WWH98932 are you experiencing this error again by any chance? im getting it again, Google mustve changed their algo agin or something to prevent our research from happening... any news? – lopezdp Jul 12 '18 at 21:37
  • 1
    pip3.6 install --upgrade --user git+https://github.com/GeneralMills/pytrends forced update from 4.4 to 4.5 Thank you! – Cody Tookode Jan 03 '19 at 07:05
  • 2
    @qpaycm lol! the struggle is real! I remember that particular problem being quite troublesome for me too back then... Here is the project that I used that library on in case it can help to give you any insight: https://github.com/lopezdp/MachineLearningResearch/blob/master/CryptoCurrencyResearch/PricingResearch.ipynb – lopezdp Jan 04 '19 at 01:44
  • 1
    Thanks to you finished my research in few hours. Today 2nd run had to proxy after experiencing 429. Now I think with a proxy carousel there will be a higher chance of getting a Max retries exceeded error. Guess I should handle it from > exceptions.py ? – Cody Tookode Jan 04 '19 at 16:21
  • 1
    @qpaycm It depends on what you are doing. I did experience the same as well after correcting the issues as stated. What I had to do was play around with the qty of requests I was making per iteration of my loop. In my specific case I was querying 60 day periods and I just switched it to 90 periods to minimize the total number of iteration cycles to stay under the 429 limit. I suspect you may have something similar where you can just just tweak the size of the request volume in order to reduce the total request iterations a bit... thats what I did anyway. – lopezdp Jan 04 '19 at 19:55
  • 1
    @lopezdp I was getting: Max retries exceeded error. Fixed editing request.py. 2 lines after `import requests`(line 8) add `from requests.adapters import HTTPAdapter` `from requests.packages.urllib3.util.retry import Retry` and 2 lines after `s=requests.session()` (line 77) add `retry = Retry(connect=3, backoff_factor=0.5)` `adapter = HTTPAdapter(max_retries=retry)` – Cody Tookode Jan 05 '19 at 08:45
  • 1
    @lopezdp all thx to you my man! Forked it to add 2 parameters: retries and backoff_factor. Few ideas: to split process to threads with queue and add proxy carousel class with it's own thread and queue for never-ending analysis, but I don't know if I am actually going to do that :-p – Cody Tookode Jan 06 '19 at 08:24
3

I had the same problem even after updating the module with pip install --upgrade --user git+https://github.com/GeneralMills/pytrends and restart python.

But, the issue was solved via the below method:

Instead of

pytrends = TrendReq(hl='en-US', tz=360, timeout=(10,25), proxies=['https://34.203.233.13:80',], retries=2, backoff_factor=0.1, requests_args={'verify':False})

Just ran:

pytrend = TrendReq()

Hope this can be helpful!

Simon PA
  • 748
  • 13
  • 26
Kyzgaldak
  • 39
  • 1
1

After running the upgrade command via pip install, you should restart the python kernel and reload the pytrend library.

1

I was having the same issue and did something really similar to Antonio Ercole De Luca. For me, however, the issue was with the cookies and not the headers.

I created a subclass like Antonio did, but this time modifying the cookie method:

cookies = {
    "SEARCH_SAMESITE": "####",
    "SID": "####",
    .
    .
    .
}

class CookieTrendReq(TrendReq):
    def GetGoogleCookie(self):
        return dict(filter(lambda i: i[0] == 'NID', cookies.items()))

And I used the same method to get the cookies as he did to get the headers:

  1. visit trends.google.com
  2. open developer tools and go to the network tab
  3. make a search, and then right-click on the top GET request (should look like explore?q=...)
  4. copy the request as bash-cURL
  5. paste this into curlconverter.com and get the cookies!
Dylan Webb
  • 11
  • 2
  • You're a legend, this is the only thing that worked for me – User Apr 05 '23 at 13:39
  • 1
    For those using this approach: if it stops working after a few tries, attempt this: https://github.com/GeneralMills/pytrends/pull/553#issuecomment-1473742736 (make sure to uncomment the 'cookie' parameter in your headers dictionary when passing it in as a variable to requests_args of TrendReq. – User Apr 06 '23 at 10:51