I am working on manipulating/extracting data from well-formed HTML in one of our legacy systems. I need to use regex to parse the HTML, find certain patterns, extract the data, and return some modified HTML. I know that regex and HTML are never the answer but, given that I know exactly where the data is coming from and that the data is properly structure, I am confident that this will work for the particular situation.
The HTML that I am working with has the following pattern:
<i>Name1</i>: Some text goes here<br/>
<i>Name2</i>: Some different text goes here<br/>
<i>Name3</i>: Some other different text goes here<br/>
I need to change the HTML to the following:
<i>Name1</i><p>Some text goes here</p>
<i>Name2</i><p>Some different text goes here</p>
<i>Name3</i><p>Some other different text goes here</p>
Basically, I want to take the inner text, wrap it in a p tag and then remove the trailing br.
I want to do something like the following:
Dim HTML as String = [The HTML goes here]
html = Regex.Replace(html, "</i>:(.+?)<br\s*\/?>", "</i><p>(.+?)</p>", RegexOptions.Multiline)
but it obviously isn't working.
In VB.net, how do I replace all desired instances of HTML with the new HTML?