-1

I am trying to solve a problem to remove HTML tags from a string. I realize that regular expressions are a better solution, but I'd like to figure out what is going wrong here.

The idea is to assume that we monitor being in a tag using 'tag', with it's value being modified by comparing the value of each char.

The problem is, the value of tag is never changed:

def remove_tag(s):
    tag = True

    for c in s:
        print "c = %s" % c
        if (c == '<'):
            print 'start_tag'
            tag == True
            print tag
        elif (c == '>'):
            print 'end tag'
            tag == False
            print tag

Running:

remove_tag("<h1>Title</h1>")

Produces:

c = <
start_tag
True
c = h
c = 1
c = >
end tag
True
c = T
c = i
c = t
c = l
c = e
c = <
start_tag
True
c = /
c = h
c = 1
c = >
end tag
True
None

I am baffled as to why 'end tag' is printed but the value 'False' does not get assigned to tag.

james_dean
  • 1,477
  • 6
  • 26
  • 37
  • 2
    You may wish to steer clear of the subject of using regular expressions with HTML on SO... ;) http://stackoverflow.com/a/1732454/83109 – David M Aug 06 '12 at 10:07

1 Answers1

9

These lines are comparisons, not assignments:

tag == True
tag == False

You want

tag = True
DSM
  • 342,061
  • 65
  • 592
  • 494