1

I'm trying to do multiple replacements in a large block of text and convert words into hyperlinks with HTML tags. I find that using the expression (\b)(word)(\b) works most of the time for finding the words I want, but one problem is that angle brackets (< and >) apparently count as boundaries, so when I run the expression again on the same string, I'm matching the words that I had already converted into links. I found a workaround just now in the expression ([\s-_()])(word)([\s-_()]), but this requires me to know what characters are allowed to be around the word rather than disallowing characters. So is there a way I could have an expression that says 'match this word with boundaries except for < and >?

Note - I CANNOT use the global flag. This is meant to be used to do 'n' replacements within a block of text, somewhere between 1 and all.

Ex

var str = "Plain planes are plain.  Plain pork is plain.  Plain pasta is plainly plain.";
str = str.replace(/(\b)(plain)(\b)/, "$1<a href='www.plain.com'>$2</a>$3");
// this will result in the first instance of 'plain' becoming 
// a link to www.plain.com

str = str.replace(/(\b)(plain)(\b)/, "$1<a href='www.plain.com'>$2</a>$3");
// this will NOT make the second instance of 'plain' into 
// a link to www.plain.com
// instead, the 'plain' in the middle of the anchor tag 
// is matched and replaced causing nested anchor tags
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Mathias Schnell
  • 884
  • 3
  • 13
  • 22
  • 2
    Use JavaScript to walk the DOM and apply the regex replacement only to text nodes. [Here is how](http://stackoverflow.com/a/2525465/1633117) – Martin Ender Dec 07 '12 at 18:22
  • Text nodes? I'm not sure what you mean. – Mathias Schnell Dec 07 '12 at 18:30
  • Any part of the DOM that is not a tag. Check out the snippet I linked. If you replace the third line with your replacement code, it should work just fine. – Martin Ender Dec 07 '12 at 18:32
  • Looks like it would work, but I'm concerned that it could use up a lot of resources if you're working with a large set of text (i.e. a news article or something) – Mathias Schnell Dec 07 '12 at 19:28
  • It's the only reliable solution. What you want is to differentiate between occurrences inside tags and outside of tags. That effectively amounts to parsing HTML. And [that cannot be done with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Martin Ender Dec 07 '12 at 22:36
  • Another side note. There is no need to capture the `\b` and write them back. `\b` matches a position, not a character, so no characters are actually consumed by it. – Martin Ender Dec 07 '12 at 22:47

1 Answers1

0

You could try a negative lookbehind like:

(?<!<a href='www\.)(\b)(plain)(\b)

Echilon
  • 10,064
  • 33
  • 131
  • 217
IDKFA
  • 436
  • 3
  • 5