63

Let's say I have the string

"12345"

If I .match(/\d{3}/g), I only get one match, "123". Why don't I get [ "123", "234", "345" ]?

user3025492
  • 2,853
  • 4
  • 18
  • 19
  • 7
    You only get one match because `"123"` was already matched, and the remaining characters, `"45"`, don't match. If you were to use `/\d{2}/g` instead you'd get `['12','34']`. Anyway, there's an answer in SO to get matching strings even if they overlap: http://stackoverflow.com/a/14863268/2563028 – EfrainReyes Dec 30 '13 at 04:27

6 Answers6

34

The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after "eating up" 123, the index is located after 3, and the only substring left for parsing is 45 - no match here.

I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.

Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all), Ruby (String#scan) and can be used in Java, too. Here is a demo using matchAll:

var re = /(?=(\d{3}))/g;
console.log( Array.from('12345'.matchAll(re), x => x[1]) );

Here is an ES5 compliant demo:

var re = /(?=(\d{3}))/g;
var str = '12345';
var m, res = [];
 
while (m = re.exec(str)) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    res.push(m[1]);
}

console.log(res);

Here is a regex101.com demo

Note that the same can be written with a "regular" consuming \d{3} pattern and manually set re.lastIndex to m.index+1 value after each successful match:

var re = /\d{3}/g;
var str = '12345';
var m, res = [];

while (m = re.exec(str)) {
    res.push(m[0]);
    re.lastIndex = m.index + 1; // <- Important
}
console.log(res);
Matias Kinnunen
  • 7,828
  • 3
  • 35
  • 46
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Oh, yes, thanks for the note! Deleted the related comment: "I think there's just a small mistake in the last source code block: Instead of res.push(m[0]); one has to use res.push(m[1]); as the match result is being stored in index 1 instead of index 0 of the array m" – Nighty42 May 15 '21 at 13:26
23

You can't do this with a regex alone, but you can get pretty close:

var pat = /(?=(\d{3}))\d/g;
var results = [];
var match;

while ( (match = pat.exec( '1234567' ) ) != null ) { 
  results.push( match[1] );
}

console.log(results);

In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position. It doesn't matter how you consume that character; . works just as well \d. And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.

This code is adapted from this answer. I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • As the condition of the while loop never changes this source code produces an endless loop... As @Wiktor Stribiżew already mentioned in his answer one would have to change the index of the regex object to be able to change the matching results. – Nighty42 May 14 '21 at 11:11
14

When an expression matches, it usually consumes the characters it matched. So, after the expression matched 123, only 45 is left, which doesn't match the pattern.

Felix Kling
  • 795,719
  • 175
  • 1,089
  • 1,143
7

To answer the "How", you can manually change the index of the last match (requires a loop) :

var input = '12345', 
    re = /\d{3}/g, 
    r = [], 
    m;
while (m = re.exec(input)) {
    re.lastIndex -= m[0].length - 1;
    r.push(m[0]);
}
r; // ["123", "234", "345"]

Here is a function for convenience :

function matchOverlap(input, re) {
    var r = [], m;
    // prevent infinite loops
    if (!re.global) re = new RegExp(
        re.source, (re+'').split('/').pop() + 'g'
    );
    while (m = re.exec(input)) {
        re.lastIndex -= m[0].length - 1;
        r.push(m[0]);
    }
    return r;
}

Usage examples :

matchOverlap('12345', /\D{3}/)      // []
matchOverlap('12345', /\d{3}/)      // ["123", "234", "345"]
matchOverlap('12345', /\d{3}/g)     // ["123", "234", "345"]
matchOverlap('1234 5678', /\d{3}/)  // ["123", "234", "567", "678"]
matchOverlap('LOLOL', /lol/)        // []
matchOverlap('LOLOL', /lol/i)       // ["LOL", "LOL"]
0

I would consider not using a regex for this. If you want to split into groups of three you can just loop over the string starting at the offset:

let s = "12345"
let m = Array.from(s.slice(2), (_, i) => s.slice(i, i+3))
console.log(m)
Mark
  • 90,562
  • 7
  • 108
  • 148
-1

Use (?=(\w{3}))

(3 being the number of letters in the sequence)

Rupert Schiessl
  • 799
  • 6
  • 11