Match word boundaries except for angle brackets

Question

I'm trying to do multiple replacements in a large block of text and convert words into hyperlinks with HTML tags. I find that using the expression (\b)(word)(\b) works most of the time for finding the words I want, but one problem is that angle brackets (< and >) apparently count as boundaries, so when I run the expression again on the same string, I'm matching the words that I had already converted into links. I found a workaround just now in the expression ([\s-_()])(word)([\s-_()]), but this requires me to know what characters are allowed to be around the word rather than disallowing characters. So is there a way I could have an expression that says 'match this word with boundaries except for < and >?

Note - I CANNOT use the global flag. This is meant to be used to do 'n' replacements within a block of text, somewhere between 1 and all.

Ex

var str = "Plain planes are plain.  Plain pork is plain.  Plain pasta is plainly plain.";
str = str.replace(/(\b)(plain)(\b)/, "$1<a href='www.plain.com'>$2</a>$3");
// this will result in the first instance of 'plain' becoming 
// a link to www.plain.com

str = str.replace(/(\b)(plain)(\b)/, "$1<a href='www.plain.com'>$2</a>$3");
// this will NOT make the second instance of 'plain' into 
// a link to www.plain.com
// instead, the 'plain' in the middle of the anchor tag 
// is matched and replaced causing nested anchor tags

Use JavaScript to walk the DOM and apply the regex replacement only to text nodes. [Here is how](http://stackoverflow.com/a/2525465/1633117) — Martin Ender, Dec 07 '12 at 18:22
Any part of the DOM that is not a tag. Check out the snippet I linked. If you replace the third line with your replacement code, it should work just fine. — Martin Ender, Dec 07 '12 at 18:32
Looks like it would work, but I'm concerned that it could use up a lot of resources if you're working with a large set of text (i.e. a news article or something) — Mathias Schnell, Dec 07 '12 at 19:28
It's the only reliable solution. What you want is to differentiate between occurrences inside tags and outside of tags. That effectively amounts to parsing HTML. And [that cannot be done with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) — Martin Ender, Dec 07 '12 at 22:36
Another side note. There is no need to capture the `\b` and write them back. `\b` matches a position, not a character, so no characters are actually consumed by it. — Martin Ender, Dec 07 '12 at 22:47

score 0 · Answer 1 · edited Dec 18 '12 at 08:38

0

You could try a negative lookbehind like:

(?<!<a href='www\.)(\b)(plain)(\b)

edited Dec 18 '12 at 08:38

Echilon

10,064
33
131
217

answered Dec 18 '12 at 08:17

IDKFA

436
3
5

Match word boundaries except for angle brackets

1 Answers1