0

I want to strip all attributes (and their values) within an HTML string using C# and RegEx.

For example:

<p>This is a text</p><span class="cls" style="background-color: yellow">This is another text</span>

would become

<p>This is a text</p><span>This is another text</span>

Also, I need to remove all attributes whether or not their values are surrounded by quotes.

i.e.

<p class="cls">Some content</p>
<p class='cls'>Some content</p>
<p class=cls>Some content</p>

should all result in

<p>Some content</p>

I cannot use HTMLAgilityPack due to security reasons, so i need to do this using RegEx.

aloisdg
  • 22,270
  • 6
  • 85
  • 105
  • 1
    possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Sriram Sakthivel Sep 05 '13 at 14:57
  • 1
    may be you find you answer in this page : http://stackoverflow.com/questions/2994448/regex-strip-html-attributes-except-src – pardeew Dec 25 '13 at 14:11
  • 1
    `I cannot use HTMLAgilityPack due to security reasons` Can you explain more about this ? – aloisdg Feb 01 '15 at 19:24

1 Answers1

0

I have a solution for you without regex. We are using a mix of SubString() and IndexOf(). I dont check for any error. It is just an idea.

Working Demo

C# :

private static void Main(string[] args)
{
    string s = @"<p>This is a text</p><span class=""cls"" style=""background-color: yellow"">This is another text</span>";

    var list = s.Split(new[] {"<"}, StringSplitOptions.RemoveEmptyEntries);
    foreach (var item in list)
        Console.Write(ClearAttributes('<' + item));
    Console.ReadLine();
}

private static string ClearAttributes(string source)
{
    int startindex = source.IndexOf('<');
    int endindex = source.IndexOf('>');
    string tag = source.Substring((startindex + 1), (endindex - startindex - 1));
    int spaceindex = tag.IndexOf(' ');
    if (spaceindex > 0)
        tag = tag.Substring(0, spaceindex);
    return String.Concat('<', tag, source.Substring(endindex));
}

Output :

<p>This is a text</p><span>This is another text</span>
aloisdg
  • 22,270
  • 6
  • 85
  • 105