I am currently trying to come up with a regular expression that will parse out something like the following:
ORIGINAL HTML:
<td align="center"><p>line 1</p><p>line 2</p><p>line 3</p></td>
INTENDED HTML:
<td align="center">line 1<br />line 2<br />line 3</td>
Note that there are other <p>...</p>
tags throughout the HTML document that must not be touched. I only want to replace <p>...</p>
within a <td>
or <th>
only.
I would also need a regexp to reverse the process. Please note that these regular expressions have to work in VB/VBScript/Classic ASP, so although I can use lookaheads (which I think is the key here), I cannot use lookbehinds. Some regex's I've tried unsuccessfully are:
1. <td[^>]*>(<p>.+<\/p>)<\/td>
2. <td[^>]*>(<p>.+<\/p>)+?<\/td>
3. <td[^>]*><p>(?:(.+?)<\/p><p>(.+))+<\/p><\/td>
4. <td[^>]*>(<p>(?:(?!<\/p>)).*<\/p>)+?<\/td>
5. <td[^>]*>(?:<p>(.+?)<\/p>)*(?:<p>(.+)<\/p>)<\/td>
6. <td[^>]*>(?:<p>(.+?)<\/p>)(?:<p>(.+)<\/p>)*(?:<p>(.+)<\/p>)<\/td>
I can "cheat" and pull out the entire line and then parse it manually usually standard VB string manipulation functions, but that's definitely not the most elegant, nor the fastest way. There has to be some way to do this in one shot using RegEx's.
Eventually I'd like to take...
<td align="center"><p><span style="color:#ff0000;"><strong>line 1</strong></span></p><p>line 2</p><p>line 3</p></td>
...and turn it into
<td align="center"><span style="color:#ff0000;"><strong>line 1</strong></span><br />line 2<br />line 3</td>
Any ideas (besides not to do this with a regex, lol)?
Thank you!