2

I want to find the index of every vowel after the first "e" in a string.

Since you can't get the index of the capture group directly from RegExp.exec(sInput), but you can get the length of a capture group containing everything in front of the actual capture group, the regex I'm using to do this is /(.*?e.*?)(a|e|i|o|u)(.*)/.

So the setup is basically like this:

let re = /(.*?e.*?)(a|e|i|o|u)(.*)/g;
let sInput = "lorem ipsum";

let tMatches = [];
let tMatchIndices = [];
let iPrevIndex = 0;

while (result = re.exec(sInput)) {
    /*  result[0]: full match
        result[1]: match for 1st capture group (.*?e.*?)
        result[2]: match for 2nd capture group (a|e|i|o|u)
        result[3]: match for 3rd capture group (.*)
    */
    let index = result[1].length + iPrevIndex;
    let sMatch = result[2];
    tMatchIndices.push(index);
    tMatches[index] = sMatch;
    iPrevIndex = index + sMatch.length;
    re.lastIndex = iPrevIndex;
}

for (i = 0; i < tMatches.length; i++) {
  let index = tMatchIndices[i];
    console.log(tMatches[index] + " at index "+index);
}

The issue is for the input string "lorem ipsum", I need the indices for both the "i" and the "u"... and it's only giving me the index for "i".

I know why it's doing this - advancing the search index past the first match cuts out the "e" that's supposed to trigger the next match. What I'm stuck on is how to fix it. I can't just simply not advance the search index, or it would never move past the first match anyway.

I've thought about simply deleting each match from the search string as I go along, but then that shifts the index of every character after it to the left, so the indices I would collect wouldn't even be accurate for the original, untruncated string.

What do?

Arcaeca
  • 227
  • 3
  • 15

3 Answers3

0

One approach to keep things simple would be to strip off the leading substring up to, and including, the first e. Then, just iterate the remaining string one character at a time, checking for vowels along the way.

sInput = "lorem ipsum";
nInput = sInput.replace(/^.*?(?:e|$)/, "");
var index = sInput.length - nInput.length;
var indices = [];
var counter = 0;
for (var i=0; i < nInput.length; i++) {
    if (/[aeiou]/.test(nInput.charAt(i))) {
        indices[counter++] = i + index;
    }
}

console.log(indices);

Regarding the output:

01234567890
lorem ipsum
      ^  ^  [6, 9]
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

You can do that with a positive lookbehind:

'lorem ipsum'.replace(/(?<=e.*)[aiueo]/g, function(m, offset) {
  console.log(m + ' ==> ' + offset)
});

Output:

i ==> 6
u ==> 9

Explanation:

  • (?<=e.*) - positive lookbehind for character e
  • [aiueo] - scan for a vowel
  • use the g flag to repeat
  • in the replace function you can reference the offset
Peter Thoeny
  • 7,379
  • 1
  • 10
  • 20
0

Using the String.prototype.matchAll, which returns an iterator of matches, each being an array with the index property:

[...str.matchAll(/(?<=^.*?e.*?)[aeiou]/gis)].map(m => m.index);

Demo:

const findIndices = str => {
  const regex = /(?<=^.*?e.*?)[aeiou]/gis;
  return [...str.matchAll(regex)].map(m => m.index);
}

console.log(findIndices('Lorem ipsum dolor sit amet.'));
.as-console-row-code { white-space: normal !important; }
Przemyslaw
  • 457
  • 1
  • 3
  • 13