I've written the following code in python that goes to the url in the array and finds specific info about that page - a web scraper of sorts. This one takes in an array of Reddit threads and outputs the score of each thread. This program almost never executes completely. Usually, i'll get through 5 or so iterations before receiving the error message below. Could someone please help me get to the bottom of this?
import urllib2
from bs4 import BeautifulSoup
urls = ['http://www.reddit.com/r/videos/comments/1i12o2/soap_precursor_to_a_lot_of_other_hilarious_shows/', 'http://www.reddit.com/r/videos/comments/1i12nx/kid_reporter_interviews_ryan_reynolds/', 'http://www.reddit.com/r/videos/comments/1i12ml/just_my_two_boys_going_full_derp_shocking_plot/']
for x in urls:
f = urllib2.urlopen(x)
data = f.read()
soup = BeautifulSoup(data)
span = soup.find('span', attrs={'class':'number'})
print '{}:{}'.format(x, span.text)
The error message I am getting is:
Traceback (most recent call last):
File "C:/Users/jlazarus/Documents/YouTubeparse2.py", line 7, in <module>
f = urllib2.urlopen(x)
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Unknown