1

Consider this blob of text:

@"
I want to match  the word 'highlight' in a string. But I don't want to match
highlight when it is contained in an HTML anchor element. The expression
should not match highlight in the following text: <a href='#'>highlight</a>
"

Here's what the output should look like (matches are in bold):

I want to match the word "highlight" in a string. But I don't want to match highlight when it is contained in an HTML anchor element. The expression should not match highlight in the following text: highlight

How would you construct an expression that matches all occurrences of X, excluding matches inside HTML anchor elements?

cllpse
  • 21,396
  • 37
  • 131
  • 170
  • 2
    see this answer : http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Brann Oct 19 '10 at 08:58
  • ... Brilliant answer :) -- I still need a very basic expression to handle my problem. – cllpse Oct 19 '10 at 09:04

1 Answers1

2

I know you asked for RegEx, but I won't do it. Instead here's a solution using Html Agility Pack.

public static void Parse()
{
    string htmlFragment =
        @"
    I want to match  the word 'highlight' in a string. But I don't want to match
    highlight when it is contained in an HTML anchor element. The expression
    should not match highlight in the following text: <a href='#'>highlight</a> more
    ";
    HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
    htmlDocument.LoadHtml(htmlFragment);
    foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes("//.").Where(FilterTextNodes()))
    {
        Console.WriteLine(node.OuterHtml);
    }
}

private static Func<HtmlNode, bool> FilterTextNodes()
{
    return node => node.NodeType == HtmlNodeType.Text && node.ParentNode != null && node.ParentNode.Name != "a" && node.OuterHtml.Contains("highlight");
}
Mikael Svenson
  • 39,181
  • 7
  • 73
  • 79
  • I went with a JavaScript-based approach. So I'm gonna accept this answer in the name of being pragmatic :) – cllpse Oct 19 '10 at 10:09