Python regular expression; match on the last instance

Question

I have a bunch of html I am trying to deal with. I want to delete the last half tag that I have. Basically I am starting with:

</div></div><div class="_3o-d" id="education

and want to end with:

</div></div>

I tried:

workSection = re.split('<.*?$',workSection)[0]

but this matches the first '<' and leaves me with an empty string. Is there a way to just match the last instance? Or to somehow start from the end?

I am also aware that splitting and then taking the first option may not be the best way of doing this, and am prepared to take a beating for it now.

Obligatory http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Christian Ternus, Oct 30 '13 at 23:10
why are you trying to do that? depending on what you hope to accomplish there is probably a better way — Joran Beasley, Oct 30 '13 at 23:11
Basically, I am just trying to remove all the HTML, but first I wanted to do some splits on some specific tags. This left a couple of half tags. — Chase Roberts, Oct 30 '13 at 23:16
The problem with using a parser is that this HTML appears to be invalid — John La Rooy, Oct 30 '13 at 23:16
Ok it's only invalid because you are using regex to _break_ it then. Save yourself future headcaches and use a proper parser — John La Rooy, Oct 30 '13 at 23:18

score 1 · Accepted Answer · answered Oct 30 '13 at 23:13

1

Just use [^<] instead of the .

>>> re.split('<[^<]*$', '</div></div><div class="_3o-d" id="education')
['</div></div>', '']

answered Oct 30 '13 at 23:13

John La Rooy

1 Answers1