class MyHTMLParser(HTMLParser):
start = False;
counter = 0;
...
This does not do what you think it does!
In Java, C#, or similar languages, what the analogous code does is declare that the class of objects known as MyHTMLParser
all have an attribute start
with the initial value of False
, and counter
with the initial value of 0
.
In Python classes are objects too. They have their own attributes, just like every other object. So what the above does in Python is create a class object named MyHTMLParser
, with an attribute start
set to False
and an attribute counter
set to 0
.1
Another thing to keep in mind is that there is no way whatsoever to make an assignment to a bare name like start = True
set an attribute on an object. It always sets a variable named start
.2
So your class contains no code that ever sets any attributes on any of your MyHTMLParser
instances; the code in the class body is setting attributes on the class object itself, and the code in handle_starttag
is setting local variables which are then discarded when they fall out of scope.
Your code in handle_data
is reading from a local variable named start
(which you never set), for similar reasons. In Python there is no way to read an attribute without specifying in which object to look for it. A bare start
is always referring to variable, either in the local function scope or some outer scope. You need self.start
to read the start
attribute of the self
object.
Remember, the def
block defining a method is nothing special, it's a function like any other. It's only later, when that function happens to be stored in an attribute of a class object, that the function can be classified as a method. So the self
parameter behaves the same as any other parameter, and indeed any other name. It doesn't have to be named self
(though that's a wise convention to follow), and it has no special privileges making reads and writes of bare names look for attributes of self
.
So:
Don't define your attributes with their initial values in the class block; that's for values which are shared by all instances of the class, not attributes of each instance. Instance attributes can only be initialised once you have a reference to the particular instance; most commonly this is done in the __init__
method, which is called as soon as the object exists.
You must specify in which object you want to read or write attributes. This applies always, in every context. In particular, you will usually refer to attributes inside methods as self.attribute
.
Applying that (and eliminating the semicolons, which you don't need in Python):
class MyHTMLParser(HTMLParser):
def __init__(self):
start = False
counter = 0
def handle_starttag(self, tag, attrs):
if(tag == 'TBODY'):
self.start = True
self.counter += 1
def handle_data(self, data):
if (self.start == True):
print data
1 The methods handle_starttag
and handle_data
are also nothing more than functions which happen to be attributes of an object that is used as a class.
2 Usually a local variable; if you've declared start
to be global
or nonlocal
then it might be an outer variable. But it's definitely not an attribute on some object you happen to have nearby, even if that other object is bound to the name self
.