2

I want to remove title attributes in html. but just if the title value has '<' character.

Text to clean: <a href="" title="bla bla bla" /><a href="" title=" bla bl<a bla" />
Output text: <a href="" title="bla bla bla" /><a href="" />

As you can see second title removed from the text because title value contains < char.

pls help

James S
  • 3,558
  • 16
  • 25
Sbarut
  • 45
  • 3

2 Answers2

3

Do yourself a favour and use a HTML parser when working with HTML; for example Html Agility Pack.

Then tasks like this become as easy as:

var html = "<a href=\"\" title=\"bla bla bla\" /><a href=\"\" title=\" bla bl<a bla\" />";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

// select all nodes with title attribute
foreach (var element in doc.DocumentNode.SelectNodes("//@title"))
    // and remove it
    if (element.Attributes["title"].Value.Contains("<"))
        element.Attributes["title"].Remove();
sloth
  • 99,095
  • 21
  • 171
  • 219
  • You've forgot the "<"-check. So something like `if (element.Attributes["title"].Contains("<")){element.Attributes["title"].Remove();}` is needed. – netblognet Feb 05 '15 at 09:58
3

A suitable regular expression (in very simple terms) might be:

title="[^"]*<[^"]*"

This means title=" followed by any number of characters that are NOT " then a < then any additional number of characters that are NOT " and finally a "

Then you can use it as follows: (note the quotes have been doubled in C# literal strings)

var test = @"<a href="""" title=""bla bla bla"" /><a href="""" title="" bla bl<a bla"" />";
var expression = @"title=""[^""]*<[^""]*""";
var rx = new Regex(expression);
var result = rx.Replace(test, "");

In my quick test this gives the desired result!

James S
  • 3,558
  • 16
  • 25