0

I have html like this:

<table class="tbNoBorder" ..some attr here...>
<tr><td>1</td></tr><tr><td>2</td></tr>
</table>

<table ..some attr here... >
<tr><td>1</td></tr><tr><td>2</td></tr>
</table>

I need to convert using Regex to

<table class="tbNoBorder" cellspacing="0" cellpadding="0" ..some attr here...>
<tr><td style="padding: 5px;">1</td></tr><tr><td style="padding: 5px;">2</td></tr>
</table>

<table cellspacing="0" cellpadding="0" ..some attr here... >
<tr><td style="border: solid 1px #ccc; padding: 5px;">1</td></tr><tr><td style="border: solid 1px #ccc; margin: 0; padding: 5px;">2</td></tr>
</table>

Then I convert it to Word, that is why I need this convertion. Tables which has class tbNoBorder must not have any borders.

I wrote this code to do this, but all tables comes with borders. The first Regex takes all tables. Any ideas to get it work?

        //Fixes tables with borders
        content = Regex.Replace(content,
            @"<table(.*?)(?!tbNoBorder)(.*?)>(.*?)</table>",
            m =>
                {
                    var tableContent = Regex.Replace(m.Groups[3].ToString(), 
                                        @"<td",
                                        t => "<td style=\"border: solid 1px #ccc; padding: 5px;\"", RegexOptions.IgnoreCase
                                        );
                    return "<table cellspacing=\"0\" cellpadding=\"0\"" + m.Groups[1] + m.Groups[2] + ">" + tableContent + "</table>";
                }, RegexOptions.IgnoreCase
            );

        //Fixes tables without borders, has class tbNoBorder
        content = Regex.Replace(content,
            @"<table(.*?)tbNoBorder(.*?)>(.*?)</table>",
            m =>
            {
                var tableContent = Regex.Replace(m.Groups[3].ToString(),
                                    @"<td",
                                    t => "<td style=\"padding: 5px;\"", RegexOptions.IgnoreCase
                                    );
                return "<table cellspacing=\"0\" cellpadding=\"0\" + m.Groups[1] + m.Groups[2] + ">" + tableContent + "</table>";
            }, RegexOptions.IgnoreCase
        );
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
podeig
  • 2,597
  • 8
  • 36
  • 60

2 Answers2

3

Change your first regex to

@"<table(?![^>]*tbNoBorder)(.*?)>(.*?)</table>"

then it will fail if there is a tbNoBorder in the opening tag

stema
  • 90,351
  • 20
  • 107
  • 135
  • Thank you! :-) It seems it does work now. What does [^>]* do? – podeig Apr 16 '12 at 08:26
  • 1
    `[^>]` is a negated character class that match anything but the `>` character. (It ensures that it will match only inside your tag.) – stema Apr 16 '12 at 08:34
1

Using xslt it could be solved like this:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" />

    <xsl:template match="/">
        <xsl:apply-templates />
    </xsl:template>

    <xsl:template match="table">
        <xsl:element name="table">
            <xsl:attribute name="cellspacing">0</xsl:attribute>
            <xsl:attribute name="cellpadding">0</xsl:attribute>
            <xsl:apply-templates select="@* | node()" />
        </xsl:element>
    </xsl:template>

    <xsl:template match="td">
        <xsl:element name="td">
            <xsl:if test="ancestor::table[not(@class='tbNoBorder')][1]">
                <xsl:attribute name="style">border: solid 1px #ccc; padding: 5px;</xsl:attribute>
            </xsl:if>
            <xsl:apply-templates />
        </xsl:element>
    </xsl:template>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>
Filburt
  • 17,626
  • 12
  • 64
  • 115
  • Thank you, Filburd! I will try to implement it. I think it is more correct implementation then regex. I answer to you how it went. :-) – podeig Apr 18 '12 at 09:19
  • I would have guessed the problem above isn't the only thing you'll need to handle and xslt will be way more flexible than string replacing using regex. If you hit any problem just post a new question. – Filburt Apr 18 '12 at 12:04
  • I works well! Now I prefer xslt against regex. Thank you! :-) Question: What does do? – podeig Apr 20 '12 at 12:03
  • The `` is the so-called identity-template which will copy every node or attribute from source to output. It is used by the generic ``. To be honest I was surprised that I had to specify a `select` in the table template to capture the existing attributes. – Filburt Apr 20 '12 at 15:51