5

I need to remove all style tags completely for the given HTML code. I found following regex to match entire style tag in the the XML. It works fine for the given Html code in online regex testers.

*style\s*=\s*('|")[^\2]*?\2([^>]*)*

However, through a C# code, it didn't work for the given HTML.

Following is the C# code:

Regex regex = new Regex("style\\s*=\\s*('|\")[^\\2]*?\\2([^>]*)", RegexOptions.IgnoreCase);
dax
  • 10,779
  • 8
  • 51
  • 86
Dimax
  • 75
  • 1
  • 1
  • 6

2 Answers2

8

I usually use the below code to remove inline styles, class, images and comments from an Outlook message prior to saving it into database:

desc = Regex.Replace(desc, "(<style.+?</style>)|(<script.+?</script>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "(<img.+?>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "(<o:.+?</o:.+?>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "<!--.+?-->", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "class=.+?>", ">", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "class=.+?\s", " ", RegexOptions.IgnoreCase | RegexOptions.Singleline);
ZooZ
  • 933
  • 1
  • 17
  • 25
  • 2
    Please don't add the same answer to multiple questions. Answer the best one and flag the rest as duplicates. See [Is it acceptable to add a duplicate answer to several questions?](http://meta.stackexchange.com/q/104227) – Bhargav Rao May 15 '16 at 09:44
7

Regex should be

 style\s*=\s*('|")[^\1]*\1

Though I would use Htmlagilitypack

   HtmlDocument doc = new HtmlDocument();
   doc.Load(yourStream);
   var elementsWithStyleAttribute = doc.DocumentNode.SelectNodes("//@style");
   foreach (var element in elementsWithStyleAttribute)
   {
       element.Attributes["style"].Remove();
   }
   doc.Save();
carla
  • 1,970
  • 1
  • 31
  • 44
Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • Modified regex doesnt work. I just need to remove only the style attribute. – Dimax Oct 12 '13 at 11:39
  • @Dimax show the exact code you are using to replace and also a sample of that html – Anirudha Oct 12 '13 at 11:43
  • Following is the code: Regex regex = new Regex("style\\s*=\\s*('|\")[^\\2]*?\\2([^>]*)", RegexOptions.Multiline | RegexOptions.IgnoreCase); htmldoc.Content = regex.Replace(htmldoc.Content, string.Empty); Sample Html Code:

    There are numerous applications

    – Dimax Oct 12 '13 at 12:11
  • @Dimax u r not using regex in ans – Anirudha Oct 12 '13 at 12:33
  • Hi Anirudh, sorry for the delay. i did the same using HtmlAgilityPack. Thnx for the sample code. – Dimax Nov 03 '13 at 15:00