1

I want to remove inner style from html by using c#. Here is my Html text

    <span style="font-family: tahoma; color: #9bbb59;">This is a simple text.</span><br />
<table>
    <thead>
    </thead>
    <tbody>
        <tr>
            <td>&nbsp;R1C1</td>
            <td>R1C2</td>
        </tr>
        <tr>
            <td>R2C1</td>
            <td>R2C2</td>
        </tr>
    </tbody>
</table>
<style type="text/css" id="telerik-reTable-1">
    .telerik-reTable-1   {
    border-width: 0px;
    border-style: none;
    border-collapse: collapse;
    font-family: Tahoma;
    }
    .telerik-reTable-1 td.telerik-reTableFooterEvenCol-1  {
    padding: 0in 5.4pt 0in 5.4pt;
    text-align: left;
    border-top: solid gray 1.0pt;
    }
</style>

I want it to looks like after remove inner css.

 <span style="font-family: tahoma; color: #9bbb59;">This is a simple text.</span><br />
<table>
    <thead>
    </thead>
    <tbody>
        <tr>
            <td>&nbsp;R1C1</td>
            <td>R1C2</td>
        </tr>
        <tr>
            <td>R2C1</td>
            <td>R2C2</td>
        </tr>
    </tbody>
</table>

I used this pattern @"<\s*style[^(style>)]*style>". But it's not working.

Note: I think I cann't use HtmlDocument to remove child node. Because it does not maintain parent child node relationship. so I want to use regular expression to remove the CSS.

corei11
  • 153
  • 1
  • 9

3 Answers3

3

You should not use regex to parse HTML documents. Check this question to understand why.

RegEx match open tags except XHTML self-contained tags

You should do it with HTML Parser, like Html Agility Pack. Here how you can do it.

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(htmlInput);

        var nodes = doc.DocumentNode.SelectNodes("//style");

        foreach (var node in nodes)
            node.ParentNode.RemoveChild(node);

        string htmlOutput = doc.DocumentNode.OuterHtml;
Community
  • 1
  • 1
mybirthname
  • 17,949
  • 3
  • 31
  • 55
1

Use System.Xml.Xsl.XslTransform with an style sheet like this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="style" />
</xsl:stylesheet>
Remo Gloor
  • 32,665
  • 4
  • 68
  • 98
1

Use this pattern to match.

<style[^<]*</style\s*>

Explanation:

  • <style match < and style word.
  • [^<]* match any character which is not < and this match occur multiple time till < occur.
  • </ match exactly </.
  • style\s*> match style word, zero or more space character after it and >.
corei11
  • 153
  • 1
  • 9