Index Error When Running Basic Web Scrape in Python

Question

I'm using Python 2.7. When I try to run this code, I get a problem when the function hits print findPatTitle[i], and python returns "Index Error: list index out of range". I'm taking this code from the 13th python tutorial on youtube, and I'm pretty sure the code is identical, so I don't understand why I would get a range problem. Any ideas?

from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re

webpage = urlopen('http://feeds.huffingtonpost.com/huffingtonpost/LatestNews').read()

patFinderTitle = re.compile('<title>(.*)<title>')

patFinderLink = re.compile('<link rel.*href="(.*)" />')

findPatTitle = re.findall(patFinderTitle,webpage)
findPatLink = re.findall(patFinderLink,webpage)

listIterator = []
listIterator[:] = range(2,16)

for i in listIterator:
    print findPatTitle[i]
    print findPatLink[i]
    print "\n"

Why are you using regex to parse the html when you have BeautifulSoup? o.O You shouldn't parse html with regex... http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not — naeg, Sep 06 '11 at 06:11

score 0 · Answer 1 · answered Sep 06 '11 at 03:35

If you regex managed to find out the title and link tags you would be getting a list of matched strings when using the findall. In that case, you can just iterate through them and print it.

Like:

for title in findPatTitle:
    print title

for link in findPatLink:
    print link

The Index Error you are getting is because you are trying to access the list of elements from 2 to 16 and there are not 16 elements in either Titles or links.

Note, listIterator[:] = range(2,16) is not a good way to write code for this purpose. You could just use

for i in range(2, 16)
    # use i

Thanks for the tip. I had a problem in my code, findPatTitle should have been (.*). Sorry about that. — Burton Guster, Sep 06 '11 at 03:52

Index Error When Running Basic Web Scrape in Python

1 Answers1