6

I am new to python and using unofficial pytrends API to crawl Google Trend. I have 2000+ keywords as DNA list and try to crawl data. When I run this code, it appears with "Google returned a response with code 429" even though I added time.sleep(1). Can anyone help me with this problem?

below is my code

#DNA has 2000+ lists
from pytrends.request import TrendReq
import pandas as pd
import xlsxwriter
import time

pytrends = TrendReq(hl='en-US,tz=360')
Data = pd.DataFrame()

#Google Trend Crawler
for i in range(DNA[i]):
    time.sleep(1)
    kw_list = [DNA[i]]
    pytrends.build_payload(kw_list, cat=0, timeframe='today 5-y', geo='', gprop='')
    df = pd.DataFrame(pytrends.interest_over_time())

    #Setting a Google Trend Dates
    if(i==0):
        Googledate = pd.DataFrame(pytrends.interest_over_time())
        Data['Date'] = Googledate.index
        Data.set_index('Date', inplace=True)

    #results
    if(df.empty == True):
        Data[DNA[i]] = ""  
    else:
        df.index.name = 'Date'
        df.reset_index(inplace=True)
        Data[DNA[i]] = df.loc[:, DNA[i]]
Data
torvin
  • 6,515
  • 1
  • 37
  • 52
EJ Kang
  • 455
  • 2
  • 5
  • 17

1 Answers1

6

HTTP/1.1 429 Too Many Requests Content-Type: text/html Retry-After: 3600

Too Many Requests

Too Many Requests

There is no official API for Google Trends. Google has probably placed a limit on the number of requests coming from the same IP.

  1. slow down until you figure out the limit.
  2. run it on several servers allowing you to appear to come from different IP addresses.
  3. stop trying to crawl Google for data they don't want to share.
Linda Lawton - DaImTo
  • 106,405
  • 32
  • 180
  • 449
  • Umm. This is for my individual research, so it would be difficult to use different IP addresses. What do you suggest for figuring out the limit? Im very new to python and the code above took me two days to write it.... :D – EJ Kang Nov 27 '17 at 12:47
  • send 1 every minute. do that for a hour see what happens. – Linda Lawton - DaImTo Nov 27 '17 at 12:48
  • @DalmTo Thank you! That would be time.sleep(60) right? – EJ Kang Nov 27 '17 at 12:52
  • time.sleep(60) # Delay for 1 minute (60 seconds). – Linda Lawton - DaImTo Nov 27 '17 at 13:06
  • This is years after this question was posted, but dropping this here incase its useful to anybody. I wouldn't take a simple time.sleep(60) approach. 60s is usually a pretty long sleep time in scraping senses. I would look at an exponential sleep retry solution to try and keep your scrape time down where possible – rayad Feb 18 '23 at 23:21
  • google recommends using exponential backoff, most of the client libraries have it built in already – Linda Lawton - DaImTo Feb 19 '23 at 20:51