3

I would like to write a regular expression in javascript to match specific text, only when it is not part of an html link, i.e.

match <a href="/link/page1">match text</a>

would not be matched, but

match text

or

<p>match text</p>

would be matched.

(The "match text" will change each time the search is run - I will use something like

var tmpStr = new RegExp("\bmatch text\b","g");

where the value of "match text" is read from a database.)

So far my best effort at a regular expression is

\bmatch text\b(?!</a>)

This deals with the closing , but not the initial . This will probably work fine for my purposes, but it does not seem ideal. I'd appreciate any help with refining the regular expression.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Tomba
  • 1,735
  • 5
  • 22
  • 38
  • in before http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 –  Feb 10 '10 at 00:57
  • Sorry Will, beat you by 11 seconds. :) – Amber Feb 10 '10 at 00:58
  • thanks for the quick responses. I don't think its the same thing - I want to match only the text inside the tags - but not if the tags are present (i.e. match text, but not text) - but I guess your message is not to use regular expressions to parse html? – Tomba Feb 10 '10 at 01:02
  • 1
    Well essentially, it depends on how specific your case is. For instance, are you wanting to also avoid matching `test match text foo`? If so, the problem becomes much, much harder to do with regex than if you always know that the things you don't want to match will never occur with other text inside a link. – Amber Feb 10 '10 at 01:09
  • @Dav I hadn't thought about that. As it happens, I don't need to avoid matching that in my case, so Eric's suggestion should work fine – Tomba Feb 10 '10 at 01:13
  • See this [previous SO question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – Amber Feb 10 '10 at 00:57

2 Answers2

4

You can use a negative look-behind to get the opening <a href=...:

var tmpStr = new RegExp('(?<!<a.*?>)match text(?!</a>)');

Hope that works for you.

Eric Wendelin
  • 43,147
  • 9
  • 68
  • 92
  • did you mean "(?!)match text(?!)" ? - this is exactly what i was looking for, thank you very much – Tomba Feb 10 '10 at 01:08
  • 1
    Note that this won't avoid matching, say, the `match text` inside of `test match text foo`. – Amber Feb 10 '10 at 01:10
  • @Dav: Right, sorry, didn't take it that far. Though it sounds like it is hard/impossible to handle every case ;) – Eric Wendelin Feb 10 '10 at 01:35
  • 1
    @Tomba, I believe `(?<!)` *is* what Eric intended to write. You were using a negative lookbehind, weren't you Eric? Trouble is, JavaScript doesn't support lookbehinds. But even if it did, regexes would be useless for this task unless you could simplify the problem somehow, as Dav suggested above. – Alan Moore Feb 10 '10 at 02:40
3

Thanks for the very quick and helpful answers. Just to clarify, the regular expression I ended up using was

(?!<a.*?>)\bmatch text\b(?!</a>)
Tomba
  • 1,735
  • 5
  • 22
  • 38
  • 1
    You realize that the above expression will match `match text `, correct? In fact, it will match anything where there's a space or other text before the ``, because the `(?!)` is literally doing nothing - the regex you've posted above is *exactly identical* in function to the 'best effort' posted in your OP: `\bmatch text\b(?!)` - why? Because `(?!)\b` is identical to `\b` - a lookahead for something that is not a word border, followed by a requirement word border, will only match a word border. – Amber Feb 10 '10 at 04:22
  • 1
    Essentially, there are two cases here: either you need to match `match text` anywhere except where it is the *only thing* inside the link (i.e. `match text`, no spaces, no other tags, nothing) - in which case your regex in the OP already would have worked fine without modification; or you need to match the text but only if it's not inside a link, even if wrapped in other text (i.e. `match text` *shouldn't* be matched), in which case the regex above won't work. Either way, you don't actually gain anything from adding `(?!` to the front. – Amber Feb 10 '10 at 04:29
  • @Dav - Thanks for explaining that. though it's not obvious from the question, I would ideally like to match "match text" wherever it is enclosed in tags, whether or not there are spaces (basically what I want to do is find a certain string, then convert it into a link, unless it's already a link). However, I can be 99% sure that the match will be the only thing inside the link, so the original (OP) regular expression will probably work fine in practice. – Tomba Feb 10 '10 at 08:01
  • Thanks for Dav and others for pointing out my answer does not make sense! I have not deleted the answer because of the valuable comments below. – Tomba Feb 10 '10 at 09:05