3

I am going to find all 'aa' sub-strings in the 'caaab'. So, I've used the following regular expression.

/aa/g

Using the cited expression, I expect that JavaScript's match method returns two correspondent patterns. As you can see, the middle, shared 'a' causes two 'aa' patterns! Nonetheless, it merely returns the first one. What is the problem with the Regex, and how can I fix it?

let foundArray=d.match(/aa/g);

Retro Code
  • 145
  • 1
  • 9
  • 1
    There is nothing wrong with the regex, it's working exactly as it's supposed to; once it consumes a character, it moves past it otherwise it will stuck on the first `aa` forever – ibrahim mahrir Feb 12 '20 at 16:46

2 Answers2

4

Here is one way to approach this. We can first record the length of the input string, for use later. Then, do a global regex replacement of a(?=a) with empty string. One by one, this will replace each occurrence of the substring aa in the input. Then, we can compare the length of the output against the input to figure out how many times aa occurred.

var input = "caaab";
var sLen = input.length;
var output = input.replace(/a(?=a)/g, "");
var eLen = output.length;
console.log("There were " + (sLen - eLen) + " occurrences of aa in the input");

Note that the difficulty you are encountering has to do with the behavior of JavaScript's regex engine. If you replace aa, it will consume everything, and so might be consuming the first letter a of the next sequential aa match. Using a(?=a) gets around this problem, because the lookahead (?=a) does not consume the next a.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

Use a lookahead

As mentioned in a comment that's how regexes are designed to work:

it's working exactly as it's supposed to; once it consumes a character, it moves past it

Matches do not overlap, this isn't a limitation of js it's simply how regular expressions work.

The way to get around that is to use a zero-length match, i.e. a look-ahead or look-behind

Tim's existing answer already does this, but can be simplified as follows:

match = "caaab".match(/a(?=a)/g);
console.log(match);

This is finding an a followed by another a (which is not returned as part of the match). So technically it's finding:

caaab
 ^ first match, single character
  ^ second match, single character
AD7six
  • 63,116
  • 12
  • 91
  • 123