0

I am trying to remove a particular property from a HTML string.

Here is my sample HTML string.

<span lang=EN-GB style='font-size:10.0pt;line-height:115%;font-family:"Tahoma","sans-serif";color:#17365D'>Thank you</span>

Is there any way to remove the line-height:115%; property from the string, which would have provide me the output as below by using Regex in C#.net?

<span lang=EN-GB style='font-size:10.0pt;font-family:"Tahoma","sans-serif";color:#17365D'>Thank you</span>

I have tried with this Regex, but it just removed all of the style attribute, but what I am trying to achieve here is to remove only the line-height property.

Regex.Replace(html, @"<([^>]*)(?:style)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>", "<$1$2>", RegexOptions.IgnoreCase);

I just need to match the line-height property in the style attribute without caring about the value it has and remove the whole line till the end of semicolon(;). Any help would be greatly appreciated. Thanks.

  • 1
    Just checking that opening the HTML in notepad with find/replace isnt an option? – kevchadders May 06 '14 at 08:07
  • 1
    Please show what you have tried. – nxu May 06 '14 at 08:07
  • 1
    I would recommend you to use a DOM parser instead of regular expressions. Regex is not recommended when dealing with html/xml. – Jite May 06 '14 at 08:10
  • If you want to post code, edit your question instead of posting it in comments - it will be way more readable. – Spook May 06 '14 at 08:20
  • Parsing HTML with regex summons tainted souls into the realm of the living. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Fedor Hajdu May 06 '14 at 08:34
  • @Jite: As applicable as this advise is in the general case, wouldn't you say that this depends on context? If this is a quick find/replace through a limited set of files he has control over himself, would you still recommend that he uses a DOM parser? – steinar May 06 '14 at 08:36
  • Hi, I think therre's a little misunderstanding here. What I am trying to do here is to remove certain attribute from the string during runtime. – user3607167 May 06 '14 at 08:41
  • @steinar That depends, if its only this line (and thats all the input) that he would want to remove a known string from, no, a parser might be a bit too much. Else, yes, probably. – Jite May 06 '14 at 08:52

2 Answers2

1

You could try using HtmlAgilityPack for this instead of using Regex.

Excuse me for the below example is a lil messy(but works) just to give you an idea of this.

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("<span lang=EN-GB style='font-size:10.0pt;line-height:115%;font-family:\"Tahoma\",\"sans-serif\";color:#17365D'>Thank you</span>");

foreach (var item in doc.DocumentNode.Descendants("span"))
{
    var temp = item.Attributes["style"];
    var styles = temp.Value.Split(';').ToList();
    var newStyleList = styles.Where(m => !m.Contains("line-height:115%")).ToList();
    string newStyle = string.Empty;
    foreach (var style in newStyleList)
    {
        newStyle += style + ";";
    }
}
Yasser Shaikh
  • 46,934
  • 46
  • 204
  • 281
0

thanks everyone for your kind suggestion. I have figured out a Regex for this situation. Here's it if anyone is interested. Thank you.

html = Regex.Replace(html, @"line-height:[^;]+;", "", RegexOptions.IgnoreCase);