0

I need to process html and I need it to be valid from XHTML perspective. For instance self-closing tags, such as <br> and <hr> in XHTML should be <br /> and <hr /> respectively.

So to take care of this issue, I converted my HTML to text and replace all <br> tags with <br /> and <hr> - with <hr />.

Now the issue is that some <hr> tags have properties. For example:

<hr width="100%" size="3" align="center" style="color: rgb(153,153,153);">

In his case the replacement becomes more complicated as I cannot simply use

str = str.Replace("<hr>","<hr/>")

Is there an easier way then writing a function that searches for every occurrence of "<hr" and then looks for the following ">" and replaces it with "/>"?

Airspeed Velocity
  • 40,491
  • 8
  • 113
  • 118
Coding Duchess
  • 6,445
  • 20
  • 113
  • 209
  • 1
    You'r probably looking for some sort of a [HTML Parser](http://stackoverflow.com/questions/20421316/what-does-html-parsing-mean). – Eminem Apr 06 '15 at 16:27
  • You could go for regular expressions, would make it a lot easier to replace the contents :) – Icepickle Apr 06 '15 at 17:18
  • Icepickle, could you help me with regular expressions? – Coding Duchess Apr 06 '15 at 17:52
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Mr Lister Apr 07 '15 at 06:41

0 Answers0