111

I am seriously going crazy over this and I've already spent an unproportionate amount of time on trying to figure out what's going on here. So please give me a hand =)

I need to do some RegExp matching of strings in JavaScript. Unfortunately it behaves very strangely. This code:

var rx = /(cat|dog)/gi;
var w = new Array("I have a cat and a dog too.", "There once was a dog and a cat.", "I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.");

for (var i in w) {
    var m = null;
    m = rx.exec(w[i]);
    if(m){
        document.writeln("<pre>" + i + "\nINPUT: " + w[i] + "\nMATCHES: " + m.slice(1) + "</pre>");
    }else{
        document.writeln("<pre>" + i + "\n'" + w[i] + "' FAILED.</pre>");
    }
}

Returns "cat" and "dog" for the first two elements, as it should be, but then some exec()-calls start returning null. I don't understand why.

I posted a Fiddle here, where you can run and edit the code.

And so far I've tried this in Chrome and Firefox.

casperOne
  • 73,706
  • 19
  • 184
  • 253
cpak
  • 1,354
  • 2
  • 11
  • 15

4 Answers4

110

Oh, here it is. Because you're defining your regex global, it matches first cat, and on the second pass of the loop dog. So, basically you just need to reset your regex (it's internal pointer) as well. Cf. this:

var w = new Array("I have a cat and a dog too.", "I have a cat and a dog too.", "I have a cat and a dog too.", "I have a cat and a dog too.");

for (var i in w) {
    var rx = /(cat|dog)/gi;
    var m = null;
    m = rx.exec(w[i]);
    if(m){
        document.writeln("<p>" + i + "<br/>INPUT: " + w[i] + "<br/>MATCHES: " + w[i].length + "</p>");
    }else{
        document.writeln("<p><b>" + i + "<br/>'" + w[i] + "' FAILED.</b><br/>" + w[i].length + "</p>");
    }
    document.writeln(m);
}
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
  • Woh– "internal pointer of a regex"? Could you recommend a resource on that one? Thanks! – katerlouis Jan 14 '19 at 15:00
  • 1
    Whoa... I've been writing JavaScript intensively for the last 14 years, and `RexExp`s more & more intensively for the last 8 years -- and this blows my mind pretty hard. Would I have a better understanding of this if I was better at Perl? – Cody Oct 20 '21 at 02:14
  • 2
    I would give you 1000 up votes if I could. That just saved me hours and blows my mind. – psteinroe Apr 01 '22 at 08:47
93

The regex object has a property lastIndex which is updated when you run exec. So when you exec the regex on e.g. "I have a cat and a dog too.", lastIndex is set to 12. The next time you run exec on the same regex object, it starts looking from index 12. So you have to reset the lastIndex property between each run.

Frode
  • 5,600
  • 1
  • 25
  • 25
  • 13
    Thanks for the explanation! It helps a lot by setting `myRe.lastIndex = 0;` for subsequent use. – Antony Jan 20 '13 at 01:52
  • 2
    I think this should be the correct answer because it follows the best practice of reusing the same regex object – smurtagh Mar 18 '19 at 20:17
  • 1
    Agree this should be the correct answer. It reuses the same regex object and also explains the internal mechanics. OP should consider changing. – Sean Coley Dec 16 '19 at 16:30
40

Two things:

  1. The mentioned need of reset when using the g (global) flag. To solve this I recommed simply assign 0 to the lastIndex member of the RegExp object. This have better performance than destroy-and-recreate.
  2. Be careful when use in keyword in order to walk an Array object, because can lead to unexpected results with some libs. Sometimes you should check with somethign like isNaN(i), or if you know it don't have holes, use the classic for loop.

The code can be:

var rx = /(cat|dog)/gi;
w = ["I have a cat and a dog too.", "There once was a dog and a cat.", "I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat."];

for (var i in w)
 if(!isNaN(i))        // Optional, check it is an element if Array could have some odd members.
  {
   var m = null;
   m = rx.exec(w[i]); // Run
   rx.lastIndex = 0;  // Reset
   if(m)
    {
     document.writeln("<pre>" + i + "\nINPUT: " + w[i] + "\nMATCHES: " + m.slice(1) + "</pre>");
    } else {
     document.writeln("<pre>" + i + "\n'" + w[i] + "' FAILED.</pre>");
    }
  }
ESL
  • 986
  • 11
  • 18
  • 2
    This should be the correct answer. Setting `rx.lastIndex = 0` is much better than re-creating the RegEx object inside the loop. – Minoru Sep 25 '19 at 19:27
  • Still it would be better just not use the `g` flag when you don't want it. It makes no sense to create a regex that specifically updates `lastIndex` just to reset it after each execution. – Robert Aug 08 '21 at 19:44
  • You may do want to search globally in an item, and then reset and reuse the regex on the next one. I think OP code is just an example to show what does not understand. – ESL Sep 03 '21 at 19:16
5

I had a similar problem using /g only, and the proposed solution here did not work for me in FireFox 3.6.8. I got my script working with

var myRegex = new RegExp("my string", "g");

I'm adding this in case someone else has the same problem I did with the above solution.

Don
  • 51
  • 1
  • 1
  • This "bug" was fixed in ES5. Originally, literal regexes were only instantiated once. Therefor it wasn't necessary to store them in a variable. The formerly succinct `while(/a/g.exec(text)) {...}` has now to be written as `regex = /a/g; while(regex.exec(text)) {...}`. Probably this change broke a lot of code on the web but is way less error-prone. On the other hand, when you want to reset `lastIndex` after each execution, the correct solution has always been just to remove the `g` flag. – Robert Aug 08 '21 at 19:23