0

Trying to figure out a way with regex to turn all multi-line html td combinations into one line EXCEPT those nested within another td

I'm trying to do a find-and-replace in visual studio (but I could use something else like wildedit, etc) to find all td tags that run across multiple lines, and put them all on one line. I want to remove all returns/tabs. The problem is though, that I don't want to do this to the parent td's if I have another table nested inside them.

So for example I want to transform this:

<table class="Top">
    <tr>
        <td class="TopLeft">
            <img src="img/spacer.gif" class="Size">
        </td>
        <td class="TopTile">
            <img src="img/spacer.gif" class="Size">
        </td>
        <td class="TopRight">
            <img src="img/spacer.gif" class="Size">
        </td>
    </tr>
    <tr>
        <td class="LeftTile">
            &nbsp;
        </td>
        <td class="TitleBar">
            Blah Blah Blah
        </td>
        <td class="RightTile">
            &nbsp;
        </td>
    </tr>
    <tr>
        <td class="LeftTile">
            &nbsp;
        </td>
        <td>
            <table cellpadding="2" cellspacing="0" border="0" class="EntryLight">
                <tr>
                    <td class="TopLeft">
                        <img src="img/spacer.gif" class="Size">
                    </td>
                    <td class="TopTile">
                        <img src="img/spacer.gif" class="Size">
                    </td>
                    <td class="TopRight">
                        <img src="img/spacer.gif" class="Size">
                    </td>
                </tr>
                <tr>
                    <td class="LeftTile">
                        &nbsp;
                    </td>
                    <td class="TitleBar">
                        Blah Blah Blah
                    </td>
                    <td class="RightTile">
                        &nbsp;
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>

Into this:

<table class="Top">
    <tr>
        <td class="TopLeft"><img src="img/spacer.gif" class="Size"></td>
        <td class="TopTile"><img src="img/spacer.gif" class="Size"></td>
        <td class="TopRight"><img src="img/spacer.gif" class="Size"></td>
    </tr>
    <tr>
        <td class="LeftTile">&nbsp;</td>
        <td class="TitleBar">Blah Blah Blah</td>
        <td class="RightTile">&nbsp;</td>
    </tr>
    <tr>
        <td class="LeftTile">&nbsp;</td>
        <td>
            <table cellpadding="2" cellspacing="0" border="0" class="EntryLight">
                <tr>
                    <td class="TopLeft"><img src="img/spacer.gif" class="Size"></td>
                    <td class="TopTile"><img src="img/spacer.gif" class="Size"></td>
                    <td class="TopRight"><img src="img/spacer.gif" class="Size"></td>
                </tr>
                <tr>
                    <td class="LeftTile">&nbsp;</td>
                    <td class="TitleBar">Blah Blah Blah</td>
                    <td class="RightTile">&nbsp;</td>
                </tr>
            </table>
        </td>
    </tr>
</table>
Soteriologist
  • 95
  • 1
  • 2
  • 9
  • 1
    "Nesting" and "regex" are rather incompatible concepts. You can manage it with some dirty tricks iff you have Visual Studio 2012, but only then, because that's the first version that uses the .NET regex library. – Tim Pietzcker Oct 10 '12 at 21:11

1 Answers1

1

This works for your example if you have Visual Studio 2012 installed. This is the first version that uses the .NET regex library:

Search for

(?<=<td[^>]*>)(?>\s+)(?!<table)|(?<!</table>\s*)\s+(?=</td>)

and replace all with nothing.

Explanation:

(?<=        # Assert that it's possible to match...
 <td[^>]*>  # an opening <td> tag
)           # before the current position,
(?>\s+)     # then match one or more whitespace characters possessively,
(?!<table)  # but only if the next tag isn't an opening <table> tag.
|           # Or:
(?<!        # (unless we're right after...
 </table>   #  a closing </table> tag
 \s*        #  which may be followed by whitespace)
)           # then
\s+         # Match whitespace
(?=</td>)   # until the next closing </td> tag
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Something to note: I don't need to do it all in one pass. I can run multiple passes at it. I'm just trying to fix some formatting issues with the way that IE processes extra/empty spaces between td tags. They throw off the page layout. FireFox and Chrome don't have issues with it. But I have a big pile of code that I didn't write that are using nested tables for layout, a problem that I am now stuck with. =/ I also don't have Visual Studio 2012 yet. I'm running 2010 at the moment. – Soteriologist Oct 10 '12 at 23:20
  • I'm getting fairly good results with this: \]\>\n:b+[^<\]{.*}\n:b+\ Replace with: Problem is, visual studio doesn't seem to be keeping special characters very well in the tagged expressions. For example... This: Becomes:
    \2  
    – Soteriologist Oct 10 '12 at 23:22
  • well, for now I've simplified things down to: pass1: \\n:b+{.*}\n:b+\ pass2: \\n:b+\ to cover two of the most common variances for my multi-line tds that I'm dealing with. – Soteriologist Oct 11 '12 at 00:03