8

I'm trying to get a link text using regex. there are possibly several links that may match the pattern and I want to get the furthest one until the 4th. Here is my JS code:

var level=1;
while ( _match = /<a href="http:\/\/www.mysite.com\/x\/(?:.*)>(.*)<\/a>/img.exec(_html)){
    if (level < 5)  (_anchor_text=_match[1]);
    level ++;
}

The problem is that this code enters infinite loop on IE (works well on FF), although the pattern exists. Any help is appreciated.

Nir
  • 24,619
  • 25
  • 81
  • 117
  • This code actually used to work up to FF3.6, because the same RegExp object has been reused in every iteration (in compliance with ES3). But then ES3 was replaced by ES5, which changed the way RegExp literals are handled: "Regular expression literals now return a unique object each time the literal is evaluated." Effectively rendering the `g` flag useless in your case. http://es5.github.com/#E Again, IE was ahead of its time. – Robert Mar 26 '13 at 00:57

1 Answers1

11

RegExp.exec, I believe, makes use of the lastIndex property and continually modifies it to make things like "global group capturing" possible; for it to work you need to have a single regular expression. Currently you're creating a new one on every iteration so it won't work...

Try this:

var level = 1;
var pattern = /<a href="http:\/\/www.mysite.com\/x\/(?:.*)>(.*)<\/a>/img;
var _match;
while ( _match = pattern.exec(_html)){
     if (level < 5)  (_anchor_text=_match[1]);
     level ++;
}
James
  • 109,676
  • 31
  • 162
  • 175
  • It actually works on Firefox, Chrome, Opera and Safari, if you use a regexp literal within the while statement. IE seems to be the one behaving differently. This is not to say that what IE is doing is wrong... – Ates Goral May 04 '10 at 20:02
  • @Ates, I think that behaviour is due to the fact that literal regular expressions are "cached" internally.. so when you re-use one, you're just referencing the same regex object. – James May 04 '10 at 20:56