Trying to find indices of all regex matches, but some being left out

Question

I want to find the index of every vowel after the first "e" in a string.

Since you can't get the index of the capture group directly from RegExp.exec(sInput), but you can get the length of a capture group containing everything in front of the actual capture group, the regex I'm using to do this is /(.*?e.*?)(a|e|i|o|u)(.*)/.

So the setup is basically like this:

let re = /(.*?e.*?)(a|e|i|o|u)(.*)/g;
let sInput = "lorem ipsum";

let tMatches = [];
let tMatchIndices = [];
let iPrevIndex = 0;

while (result = re.exec(sInput)) {
    /*  result[0]: full match
        result[1]: match for 1st capture group (.*?e.*?)
        result[2]: match for 2nd capture group (a|e|i|o|u)
        result[3]: match for 3rd capture group (.*)
    */
    let index = result[1].length + iPrevIndex;
    let sMatch = result[2];
    tMatchIndices.push(index);
    tMatches[index] = sMatch;
    iPrevIndex = index + sMatch.length;
    re.lastIndex = iPrevIndex;
}

for (i = 0; i < tMatches.length; i++) {
  let index = tMatchIndices[i];
    console.log(tMatches[index] + " at index "+index);
}

The issue is for the input string "lorem ipsum", I need the indices for both the "i" and the "u"... and it's only giving me the index for "i".

I know why it's doing this - advancing the search index past the first match cuts out the "e" that's supposed to trigger the next match. What I'm stuck on is how to fix it. I can't just simply not advance the search index, or it would never move past the first match anyway.

I've thought about simply deleting each match from the search string as I go along, but then that shifts the index of every character after it to the left, so the indices I would collect wouldn't even be accurate for the original, untruncated string.

What do?

score 0 · Answer 1 · answered Feb 14 '21 at 04:36

One approach to keep things simple would be to strip off the leading substring up to, and including, the first e. Then, just iterate the remaining string one character at a time, checking for vowels along the way.

sInput = "lorem ipsum";
nInput = sInput.replace(/^.*?(?:e|$)/, "");
var index = sInput.length - nInput.length;
var indices = [];
var counter = 0;
for (var i=0; i < nInput.length; i++) {
    if (/[aeiou]/.test(nInput.charAt(i))) {
        indices[counter++] = i + index;
    }
}

console.log(indices);

Regarding the output:

01234567890
lorem ipsum
      ^  ^  [6, 9]

Peter Thoeny · Answer 2 · 2021-02-25T19:32:57.577

0

You can do that with a positive lookbehind:

'lorem ipsum'.replace(/(?<=e.*)[aiueo]/g, function(m, offset) {
  console.log(m + ' ==> ' + offset)
});

Output:

i ==> 6
u ==> 9

Explanation:

(?<=e.*) - positive lookbehind for character e
[aiueo] - scan for a vowel
use the g flag to repeat
in the replace function you can reference the offset

edited Feb 25 '21 at 19:32

answered Feb 25 '21 at 19:26

Peter Thoeny

7,379
1
10
20

score 0 · Answer 3 · answered Jul 16 '22 at 23:15

Using the String.prototype.matchAll, which returns an iterator of matches, each being an array with the index property:

[...str.matchAll(/(?<=^.*?e.*?)[aeiou]/gis)].map(m => m.index);

Demo:

const findIndices = str => {
  const regex = /(?<=^.*?e.*?)[aeiou]/gis;
  return [...str.matchAll(regex)].map(m => m.index);
}

console.log(findIndices('Lorem ipsum dolor sit amet.'));

.as-console-row-code { white-space: normal !important; }

Trying to find indices of all regex matches, but some being left out

3 Answers3