1

I am facing a bit problem in Regex. I want to find the start and end index of the complete matched string in the input string.

e.g. I have an array of strings like

["a", "aa"]

and I have a text like I like a problem aa

I am doing with iteration of array strings.

let arr = ["a", "aa"];
let str = "I like a problem aa";
let indicesArr = [];
arr.forEach(a=>{
  const regexObj = new RegExp(a, "gi");
  let match;
  while ((match = regexObj.exec(str))) {
    let obj = { start: match.index, end: regexObj.lastIndex }
    indicesArr.push(obj);
    if(!match.index || !regexObj.lastIndex) break;
  }
});

above code gives me the result

[
  {start: 7, end: 8},
  {start: 17, end: 18},
  {start: 18, end: 19},
  {start: 17, end: 19}
]

I want the result should be

[
  {start: 7, end: 8},
  {start: 17, end: 19}
]

Any suggestion would be very helpful, thanks:)

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
abinas patra
  • 359
  • 3
  • 21

1 Answers1

2

The problem here is that a finds two matches in aa. You need to make sure you match all occurrences of a regex that finds either aa or a in this order. It means, the regex must be /aa|a/g and not /a|aa/g as the order of alternation matters in regex.

Here, you can use

let arr = ["a", "aa"];
let str = "I like a problem aa";
let indicesArr = [];
arr.sort((a, b) => b.length - a.length);
const regexObj = new RegExp(arr.map(x=> x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).join('|'), "gi");
let match;
while (match = regexObj.exec(str)) {
    let obj = { start: match.index, end: regexObj.lastIndex }
    indicesArr.push(obj);
}
console.log(indicesArr);

Note these two lines:

  • arr.sort((a, b) => b.length - a.length); - sorts the arr items by length in the descending order (to put aa before a)
  • const regexObj = new RegExp(arr.map(x=> x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).join('|'), "gi"); - escapes all items in the arr array for use inside a regex, and joins the items with | alternation operator into a single string regex pattern.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563