0

These are ids either corresponding to the dead links or empty pages: 1228, 1695 ,1310, 1235, 1163, 1416, 1586, 1236, 1429, and 1627.

After running my code, two unwanted ids still appear. I use jupyter notebook.The re-appearing, unwanted ids are different whenever I re-run the kernel. Eg - 1310 and 1416, or, 1695 and 1627

What's the problem?

ids = ['1228', '1446', '1409', '1184', '1568', '1154', '1695', '1333', '1235', '1310', '1232', '1316', '1137', '1163', '1393', '1677', '1407', '1200', '1416', '1586', '1236', '1454', '1078', '1088', '1510', '1121', '1607', '1194', '1574', '1423', '1429', '1231', '1113', '1627', '1361', '1357', '1323']

#The function below takes an id, connects with the corresponding xml page
def get_poll_xml(id):
    id = str(id)
    source = requests.get('http://charts.realclearpolitics.com/charts/' + id + '.xml')

    if source.status_code == 200:
        soup = BeautifulSoup(source.text)
        if soup.series.value:
            return soup
        else:
            return None

    else:
        return None

# the following function takes a list of id and removes the ones with dead or empty links
def check_for_real(ids):

    for id in ids:
        j = get_poll_xml(id)
        if j is None:
            ids.remove(id)

    return ids
  • Just one aditional hint: docstrings in python are usually written with `"""tripple double quotes"""` and written under the function's signature, not above. If you are really eager check https://www.python.org/dev/peps/pep-0257/ – Gregor Oct 17 '19 at 11:12

1 Answers1

0

You shouldn't try to edit the list you're iterating over. You can and should first make a copy of the list.

User bluepnume offers a solution to this in another post:

Python: Removing list element while iterating over list

for id in list(ids):
    j = get_poll_xml(id)
    if j is None:
        ids.remove(id)
  • I think that this reference is more useful: https://stackoverflow.com/a/1207461/10581769. And why not go for list comprehensions: `[id for id in ids if get_poll_xml(id)]` – Gregor Oct 17 '19 at 10:57