These are ids either corresponding to the dead links or empty pages: 1228, 1695 ,1310, 1235, 1163, 1416, 1586, 1236, 1429, and 1627.
After running my code, two unwanted ids still appear. I use jupyter notebook.The re-appearing, unwanted ids are different whenever I re-run the kernel. Eg - 1310 and 1416, or, 1695 and 1627
What's the problem?
ids = ['1228', '1446', '1409', '1184', '1568', '1154', '1695', '1333', '1235', '1310', '1232', '1316', '1137', '1163', '1393', '1677', '1407', '1200', '1416', '1586', '1236', '1454', '1078', '1088', '1510', '1121', '1607', '1194', '1574', '1423', '1429', '1231', '1113', '1627', '1361', '1357', '1323']
#The function below takes an id, connects with the corresponding xml page
def get_poll_xml(id):
id = str(id)
source = requests.get('http://charts.realclearpolitics.com/charts/' + id + '.xml')
if source.status_code == 200:
soup = BeautifulSoup(source.text)
if soup.series.value:
return soup
else:
return None
else:
return None
# the following function takes a list of id and removes the ones with dead or empty links
def check_for_real(ids):
for id in ids:
j = get_poll_xml(id)
if j is None:
ids.remove(id)
return ids