0

I am attempting to web scrape specific data off of (website here) to produce a list of additional statistics for this computer game for my personal interests. However, whenever I attempt to scrape data I get the following error:

HTTP Error 429: Restricted

I researched the error and it says: "The user has sent too many requests in a given amount of time. Intended for use with rate limiting schemes." I am unable to open any page on this website as a result.

Here is my current code:

import urllib.request
import urllib.parse

try:
    url = urllib.request.urlopen('website here')
    headers = {}
    headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'
    req = urllib.request.Request(url, headers=headers)
    resp = urllib.requests.urlopen(req)
    respData = resp.read()
    print (respData)

except Exception as e:
    print(str(e))

Changing the "User-Agent" wasn't successful. Since the error is referring to a large amount of requests, is it possible that I can add a delay somewhere in my script? I was thinking that I could write a file on my computer that would keep adding chunks of the page over time. Any ideas?

And I am somewhat new to Python/scraping, so try to keep it in simple terms :) Thank you!

-2.0

Note: I have Python 3.4 and 2.7

sideshowbarker
  • 81,827
  • 26
  • 193
  • 197
TwoPointOH
  • 41
  • 3
  • 7
  • 1
    I would expect `python` to have a `sleep` function. As far as your 429 error, you're probably banned by IP address. Sometimes the ban is permanent and sometimes they clear it out after X days, really depends on the site. You are probably violating the Terms of Service that we all click thru without reading. Good luck. – shellter May 25 '15 at 13:14
  • 1
    How to avoid HTTP error 429 ... http://stackoverflow.com/questions/22786068/how-to-avoid-http-error-429-too-many-requests-python – Dušan Maďar May 25 '15 at 20:44

0 Answers0