Description
This expression will:
- allow you to replace only the
hello world
substrings which are outside the anchor tags
- avoid difficult edge cases which makes pattern matching in HTML difficult
- does not use atomic groups as they are not allowed in Javascript
Regex
((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)(hello\sworld\s\d+)((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)
Full Explaination
Theory:
((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)
Captures the anchor tags, and any text outside the anchor tags which is not hello world
. This is group 1
(hello\sworld\s\d+)
Captures the hello world. This is group 2. Since I added digits in my sample text to help show which sub strings were being captured, I also added the \s\d+
to this section. Yes arguably this beyond your original scope. :)
((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)
Captures the anchor tags, and any text outside the anchor tags which is not hello world
. This is group 3. It's an identical pattern to group 1, but is required or else you might encounter odd results on the last match in the string.
Replace With
In the samples below I used this replacement to help make it more obvious what's happening:
$1_______$3
You could use this to replace your hello world
strings with anchor tags with this:
$1<a href="$2">$2</a>$3

Examples
Sample text
Note the difficult edge cases in the anchor tag with the onmouseover attribute. I also added numbers to each of the hello world
s so they are easier for us humans to read.
<a href="#">hello world 00</a>Hello world 1! hello world 2! Hello world 3! <a onmouseover=' a=1; href="www.NotYourURL.com" ; if (3 <a && href="www.NotYourURL.com" && id="revSAR" && 6 > 3) { funRotate(href) ; } ; ' href="#">hello world 04</a><p>hello world 5</p><p><a href="#">hello world 06</a></p> <a href="#">hello world 07</a>fdafdsa
Sample Javascript
<script type="text/javascript">
var re = /((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)(hello\sworld\s\d+)((?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>|(?!hello\sworld|<a\s).)*)/;
var sourcestring = "source string to match with pattern";
var replacementpattern = "$1<a href="$2">$2</a>$3";
var result = sourcestring.replace(re, replacementpattern);
alert("result = " + result);
</script>
String After Replacement
This is just to show what's happening, using the first "replace with"
<a href="#">hello world 00</a>_______! _______! _______! <a href="#">hello world 04</a><p>_______</p><p><a href="#">hello world 06</a></p> <a href="#">hello world 07</a>fdafdsa
This is using the second "replace with" to show how that it actually works
<a href="#">hello world 00</a><a href="Hello world 1">Hello world 1</a>! <a href="hello world 2">hello world 2</a>! <a href="Hello world 3">Hello world 3</a>! <a onmouseover=' a=1; href="www.NotYourURL.com" ; if (3 <a && href="www.NotYourURL.com" && id="revSAR" && 6 > 3) { funRotate(href) ; } ; ' href="#">hello world 04</a><p><a href="hello world 5">hello world 5</a></p><p><a href="#">hello world 06</a></p> <a href="#">hello world 07</a>fdafdsa