0

I actually have a string with html. And I would like to parse it using xmlparser. The pb is that some tags of my string are not correct. Especially the <img /> tags. So I need to replace those tags because it miss the final /. I would like to retrieve all img tags and add a / at the end. For that, I need to find all the <img in my text until the next > to replace it by /> in order to parse my string.

Anyone can help me?

Thanks

kschaeffler
  • 4,083
  • 7
  • 33
  • 41

1 Answers1

3

You are asking for all kinds of trouble. Try a library that is better suited to the task. It looks like BeautifulSoup may be what you want.

If you are dead set on using xmlparser, then you might want to use BeautifulSoup to clean up the HTML first. See: How do I fix wrongly nested / unclosed HTML tags?

Community
  • 1
  • 1
We Are All Monica
  • 13,000
  • 8
  • 46
  • 72