1

I have html code which jQuery adds random attributes to it like:

<td style='font-size: x-large;' jquery9202340423042='22423423424'>

Using c# Regex I want to find and remove any attribute which starts with jquery

I have the code below but it removes all attributes:

public static void Main(string[] args)
{
     string before ="<td style='font-size: x-large;' jquery9202340423042='22423423424'>";

     //string after = Regex.Replace(before, regexImgSrc, "<$1>");
     //string regexImgSrc = @"<(table|tr|td)[^>]*?" + "jquery9202340423042" + @"\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";

    string after = Regex.Replace(before, @"(?i)<(table|tr|td)(?:\s+(?:""[^""]*""|'[^']*'|[^""'>])*)?>", "<$1>");

     Console.WriteLine(after);
}
Shuaib
  • 1,561
  • 3
  • 19
  • 28
  • You want to change `` to `` ? – Thomas Ayoub Feb 09 '16 at 13:23
  • Yes Thomas. You are right. – Shuaib Feb 09 '16 at 13:25
  • I dupehammered this to the idiomatic ["don't use regex to parse html"](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) question in order to prevent anyone else thinking that the right solution to handling HTML is with regex. It isn't. – spender Feb 09 '16 at 14:03

2 Answers2

2

You need to use this:

Regex.Replace(before, @"(jquery\d*=[\"']\d*[\"'])", "");

Which will replace anything that follow the pattern jqueryXXX='XXX' where XXX is any number

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
1

Why are you trying to do this with Regex?

Regex is absolutely the wrong tool for the job (even though at a cursory glance, this might not be obvious to you).

Using Regex might work for specific cases, but will always be a brittle solution.

Use an HTML parser like HtmlAgilityPack and you can approach this far more sensibly. Now you can do something like this:

string before ="<td style='font-size: x-large;' jquery9202340423042='22423423424'>";
var doc = new HtmlDocument();
doc.LoadHtml(before);
var el = doc.DocumentNode.FirstChild;
var attrsToRemove = el.Attributes.Where(att => att.Name.StartsWith("jquery")).ToList();
attrsToRemove.ForEach(a => a.Remove());
Console.WriteLine(el.OuterHtml);
Community
  • 1
  • 1
spender
  • 117,338
  • 33
  • 229
  • 351
  • Thank you. I actually tired HtmlAgilityPack first but I realized the code was too long and I needed to do this for the entire HTML document and not just one line. That is why I decided to do it in Regex. But now I went back to HtmlAgilityPack and it is working well. – Shuaib Feb 09 '16 at 18:20