0

Suppose I have a string containing the following anchors, among other content:

<a href="foo.html" class="foo">Foo</a>
<a href="bar.html" class="excludeMe">Bar</a>
<a href="baz.html" class="baz">Baz</a>

Now I need a regex for matching all anchors that DO NOT have the class "excludeMe". The anchors can have any number of attributes in addition to href and class, and they are not necessarily in a fixed order. But the anchors that have the "excludeMe" class, will only have that single class. I have the following pattern for matching all anchors:

@"(<a.*?>.*?</a>)"

Now I need to expand it so the anchors with "excludeMe" class are not matched. I've tried using negative lookahead to achieve this, but it seems I'm unable to get it right. Either all anchors, no anchors, or content after the anchors is matched.

Any suggestions on how to do it?

Thanks!

  • 6
    Don't use Regex for this. Use an HTML parsing library like [HtmlAgilityPack](http://html-agility-pack.net/) – maccettura May 15 '18 at 14:25
  • I know HtmlAgilityPack is generally better for parsing markup, but in this particular case I would like to know if it's possible to do with regex. – J. Hahn May 15 '18 at 19:23
  • Please don't do this. See this masterpiece of an answer: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Gerino May 16 '18 at 11:55
  • Enjoy your religion :) – J. Hahn May 17 '18 at 21:17

1 Answers1

0

Ended up with this pattern that seems to do the job:

@"(<a(?![^>]+class=""excludeMe"").*?>.*?</a>)"