73

Why do Javascript sub-matches stop working when the g modifier is set?

var text = 'test test test test';

var result = text.match(/t(e)(s)t/);
// Result: ["test", "e", "s"]

The above works fine, result[1] is "e" and result[2] is "s".

var result = text.match(/t(e)(s)t/g);
// Result: ["test", "test", "test", "test"]

The above ignores my capturing groups. Is the following the only valid solution?

var result = text.match(/test/g);
for (var i in result) {
    console.log(result[i].match(/t(e)(s)t/));
}
/* Result:
["test", "e", "s"]
["test", "e", "s"]
["test", "e", "s"]
["test", "e", "s"]
*/

EDIT:

I am back again to happily tell you that 10 years later you can now do this (.matchAll has been added to the spec)

let result = [...text.matchAll(/t(e)(s)t/g)];
Chad Cache
  • 9,668
  • 3
  • 56
  • 48

2 Answers2

101

Using String's match() function won't return captured groups if the global modifier is set, as you found out.

In this case, you would want to use a RegExp object and call its exec() function. String's match() is almost identical to RegExp's exec() function…except in cases like these. If the global modifier is set, the normal match() function won't return captured groups, while RegExp's exec() function will. (Noted here, among other places.)

Another catch to remember is that exec() doesn't return the matches in one big array—it keeps returning matches until it runs out, in which case it returns null.

So, for example, you could do something like this:

var pattern = /t(e)(s)t/g;  // Alternatively, "new RegExp('t(e)(s)t', 'g');"
var match;    

while (match = pattern.exec(text)) {
    // Do something with the match (["test", "e", "s"]) here...
}

Another thing to note is that RegExp.prototype.exec() and RegExp.prototype.test() execute the regular expression on the provided string and return the first result. Every sequential call will step through the result set updating RegExp.prototype.lastIndex based on the current position in the string.

Here's an example: // remember there are 4 matches in the example and pattern. lastIndex starts at 0

pattern.test(text); // pattern.lastIndex = 4
pattern.test(text); // pattern.lastIndex = 9
pattern.exec(text); // pattern.lastIndex = 14
pattern.exec(text); // pattern.lastIndex = 19

// if we were to call pattern.exec(text) again it would return null and reset the pattern.lastIndex to 0
while (var match = pattern.exec(text)) {
    // never gets run because we already traversed the string
    console.log(match);
}

pattern.test(text); // pattern.lastIndex = 4
pattern.test(text); // pattern.lastIndex = 9

// however we can reset the lastIndex and it will give us the ability to traverse the string from the start again or any specific position in the string
pattern.lastIndex = 0;

while (var match = pattern.exec(text)) {
    // outputs all matches
    console.log(match);
}

You can find information on how to use RegExp objects on the MDN (specifically, here's the documentation for the exec() function).

idmean
  • 14,540
  • 9
  • 54
  • 83
hbw
  • 15,560
  • 6
  • 51
  • 58
  • 3
    using exec doesn't seem to listen to the g modifier, but it supports sub-matches/groups. So the result would be the first match (it basically ignores the g modifier) – Chad Cache May 09 '09 at 21:03
  • Added a clarification about that—you have to call exec() repeatedly to get the multiple matches. – hbw May 09 '09 at 21:05
  • 2
    Not the most elegant solution. i was expecting an output somewhat like this: [ ["test", "e", "s"], ["test", "e", "s"], ["test", "e", "s"], ["test", "e", "s"] ] – Chad Cache May 09 '09 at 21:13
  • @htw this was a while back but i was rereading this and i found it weird that you did `new RegExp(/t(e)(s)t/g);` as an alt. wouldn't it be `new RegExp('t(e)(s)t', 'g');` – Chad Cache Dec 05 '12 at 05:59
  • That makes more sense and seems more in-line with how the explicit RegExp constructor would actually be used. I've updated my answer accordingly. – hbw Dec 05 '12 at 09:28
  • 1
    Old, old question I know, but I had a need of this recently, and I whipped this up: `RegExp.prototype.execAll = function(s) { var r = [],m; while(m = this.exec(s)) r.push(m); return r; }`. With that, you can do: `/t(e)(s)t/.matchAll("test")` and get the results that @ChadScira was looking for. – rossipedia Jul 10 '13 at 21:18
  • 3
    Note for others bumping into another problem: If you use `.test()` before it, make sure you reset the lastIndex using `pattern.lastIndex = 0` before the `while` loop to get all the matches – Iulian Onofrei Apr 10 '14 at 08:33
  • @IulianOnofrei Thanks, updated answer in question detailing whats actually going on. – Chad Cache Jul 03 '14 at 00:51
  • @ChadScira Great explanation, thanks! Hope it helps others too. – Iulian Onofrei Jul 03 '14 at 06:42
  • 3
    The g flag is not ignored. It needs to be there, otherwise you'll get an infinite loop. Found out the hard way here :) – Sarsaparilla Oct 07 '14 at 23:44
  • 1
    Explicitly addresssing another corner case: If the regex has a capture group but global modifier _isn't_ being used, match() will return *the full match first*, then all substrings matching the capture. E.g. `'foobar'.match(/f(o)*(ba)/)` will return `["fooba", "o", "ba"]`. – Eric Nguyen Jan 09 '15 at 15:58
5

I am surprised to see that I am the first person to answer this question with the answer I was looking for 10 years ago (the answer did not exist yet). I also was hoping that the actual spec writers would have answered it before me ;).

.matchAll has already been added to a few browsers.

In modern javascript we can now accomplish this by just doing the following.

let result = [...text.matchAll(/t(e)(s)t/g)];

.matchAll spec

.matchAll docs

I now maintain an isomorphic javascript library that helps with a lot of this type of string parsing. You can check it out here: string-saw. It assists in making .matchAll easier to use when using named capture groups.

An example would be

saw(text).matchAll(/t(e)(s)t/g)

Which outputs a more user-friendly array of matches, and if you want to get fancy you can throw in named capture groups and get an array of objects.

Chad Cache
  • 9,668
  • 3
  • 56
  • 48