0

I am using Python to retrieve HTML from a webpage and then parsing it in the MyHtmlParer class. If I find certain data in the HTML, I want to add it to links[] and return it to the main method.

import urllib2
from MyHtmlParser import MyHtmlParser


def HtmlRetrieve(url):
    req = urllib2.Request(url, headers={'User-Agent': "Magic Browser"})
    con = urllib2.urlopen(req)
    return con.read()


def main():
    url = "someUrl.com"

    html = HtmlRetrieve(url)

    parser = MyHtmlParser()
    parser.feed(html)
    print parser.links

main()

Then this is my MyHtmlParser Class

from HTMLParser import HTMLParser


class MyHtmlParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.links = []

    def handle_data(self, data):
        if data == "some text":
            self.links.append(data)

The above code adds the data to self.links but in my main method parser.links does not have any data. What do I need to do to get the data from MyHtmlParser() to my main() method?

  • To fix the TypeError, you need to pass the instance to the parent class's `__init__` method: `HTMLParser.__init__(self)`. – Marius Jan 20 '16 at 04:39
  • or super(MyHtmlParser, self).__init__() or super().__init__() for python3 see http://stackoverflow.com/questions/222877/how-to-use-super-in-python – West Jan 20 '16 at 05:07
  • That fixes the type error, but how do I get the data to be returned to my main() method? – Dan Stirling-Talbert Jan 21 '16 at 03:07

0 Answers0