2

Where n=4 in my example.

I'm very new to Regex and have searched for 20 minutes now. There are some helpful websites out there that simplify things but I can't work out how to proceed with this.

I wish to extract every combination of 4 consecutive digits from this:

12345

to get:

1234 - possible with ^\d{4}/g  - Starts at the beginning
2345 - possible with  \d{4}$/g - Starts at the end

But I can't get both! The input could be any length.

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
DaveHolt
  • 119
  • 1
  • 3
  • 13

3 Answers3

2

Your expression isn't working as expected because those two sub-strings are overlapping.

Aside from zero-length assertions, any characters in the input string will be consumed in the matching process, which results in the overlapping matches not being found.

You could work around this by using a lookahead and a capturing group to retrieve the overlapping matches. This works because lookahead assertions (as well as lookbehind assertions) are classified as zero-length assertions, which means that they don't consume the matches; thereby allowing you to find any overlapping matches.

(?=(\d{4}))

Here is a quick snippet demonstrating this:

var regex = /(?=(\d{4}))/g;
var input = '12345678';
var match;

while ((match = regex.exec(input)) !== null) {
    if (match.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    console.log(match[1]);
}
Josh Crozier
  • 233,099
  • 56
  • 391
  • 304
  • Thankyou for the detailed explanation. I just asked [this question](http://stackoverflow.com/questions/42456201/what-should-the-javascript-match-regex-function-return) as I'm having trouble returning the matches. What you say about zero-length has gone over my head a bit but might explain why I'm getting odd results. EDIT: I should clarify that I'm not using your code as it resulted in an infinite loop for some reason. – DaveHolt Feb 25 '17 at 12:53
1

You can use a lookahead with a capturing group:

(?=(\d{4}))

See demo

Graham
  • 7,431
  • 18
  • 59
  • 84
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
0

Use a look ahead assertion with all the possibilities
(?=(0123|1234|2345|3456|4567|5678|6789))

 (?=
      (                             # (1 start)
           0123
        |  1234
        |  2345
        |  3456
        |  4567
        |  5678
        |  6789 
      )                             # (1 end)
 )

Output

 **  Grp 0 -  ( pos 0 , len 0 )  EMPTY 
 **  Grp 1 -  ( pos 0 , len 4 ) 
1234  

------------------

 **  Grp 0 -  ( pos 1 , len 0 )  EMPTY 
 **  Grp 1 -  ( pos 1 , len 4 ) 
2345  
  • While this does indeed work, it assumes that we know all the possible outcomes in advance. In my situation, the input could be random/unpredictable. – DaveHolt Feb 25 '17 at 12:44
  • @DaveHolt - Hey partner, I assume `every combination of 4 consecutive digits` means the adjacent digit's are code of the character +/- 1. Otherwise, _consecutive_ has no special meaning when used with the term _combination_. It's simply a character class `[0-9]` If you would have added _unique_ to the description, that is something entirely different. –  Feb 26 '17 at 20:10
  • Also, just a comment. All these answers use overlapped searches. This is nothing new, however it's methodology is seldom understood or described. –  Feb 26 '17 at 20:18