How to gather specific information using urllib2 in python

Question

As of now I have created a basic program in python 2.7 using urllib2 and re that gathers the html code of a website and prints it out for you as well as indexing a keyword. I would like to create a much more complex and dynamic program which could gather data from websites such as sports or stock statistics and aggregate them into lists which could then be used in analysis in something such as an excel document etc. I'm not asking for someone to literally write the code. I simply need help understanding more of how I should approach the code: whether I require extra libraries, etc. Here is the current code. It is very simplistic as of now.:

    import urllib2
    import re 

    y = 0

    while(y == 0):
        x = str(raw_input("[[[Enter URL]]]"))
        keyword = str(raw_input("[[[Enter Keyword]]]"))
        wait = 0
        try:
            req = urllib2.Request(x)
            response = urllib2.urlopen(req)
            page_content = response.read()
            idall = [m.start() for m in re.finditer(keyword,page_content)]
            wait = raw_input("")
            print(idall)
            wait = raw_input("")
            print(page_content)

        except urllib2.HTTPError as e:
            print e.reason

http://stackoverflow.com/questions/11709079/parsing-html-python — Ignacio Vazquez-Abrams, Sep 25 '14 at 05:12

score 4 · Accepted Answer · answered Sep 25 '14 at 05:14

You can use requests to deal with interaction with website. Here is link for it. http://docs.python-requests.org/en/latest/

And then you can use beautifulsoup to handle the html content. Here is link for it.http://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

They're more ease of use than urllib2 and re. Hope it helps.

How to gather specific information using urllib2 in python

1 Answers1