4

I've got a string and I want to get an array with the indexes (positions) of the characters in this string that do not match a certain regex criteria.

The issue here is that if I write it like this:

let match;
let reg = /[A-Za-z]|[0-9]/g;
let str = "1111-253-asdasdas";
let indexes = [];

do {
    match = reg.exec(str);
    if (match) indexes.push(match.index);
} while (match);

It works. It returns the indexes of all the characters that are numerical or alphabetical. But the problem is that if I try to make the opposite, with a negative lookahead in Regex, like this:

let match;
let reg = /(?!([A-Za-z]|[0-9]))/g;
let str = "1111-253-asdasdas";
let indexes = [];

do {
    match = reg.exec(str);
    if (match) indexes.push(match.index);
} while (match);

It ends up in an infinite loop.

What I'd like to achieve is the same result as in the first case, but with the negative regex, so in this case the result would be:

indexes = [4, 8]; // which are the indexes in which a non-alphanumerical character appears

Is the loop wrong, or it's the regex expression the one who is messing things up? Maybe the exec is not working with negative lookaheads Regex expressions?

I would understand the regex expression not working as I intended to (because it may be wrongly formatted), but I don't understand the infinite loop, which leads me to think that exec maybe is not the best way to achieve what I'm looking for.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Unapedra
  • 2,043
  • 4
  • 25
  • 42
  • 1
    The infinite loop is easy to explain: the regex has a `g` modifier and thus tries to match multiple occurrences of the pattern, but since your pattern matches an empty string, and you do not check the condition if the `index` is equal to `lastIndex`, the regex cannot advance in a string. Use a regex to match any non-alphanumeric chars, `/[\W_]/g` – Wiktor Stribiżew Mar 04 '19 at 11:10
  • 2
    cant' you just change the regex so that it searches for non-alphanumeric characters: `/[^A-Za-z0-9]/` – Robin Zigmond Mar 04 '19 at 11:10
  • @WiktorStribiżew thank you for the explanation, as this was the reason of my question, because I did not understand why this was happening. – Unapedra Mar 04 '19 at 11:19
  • @RobinZigmond thank you because that's exactly what I was trying to achieve! – Unapedra Mar 04 '19 at 11:19
  • @WiktorStribiżew I reopened because your duplicate link was not specific enough. – Tim Biegeleisen Mar 04 '19 at 11:23
  • 1
    Sorry @WiktorStribiżew, can you post your comment as an answer so I can accept it? It explains why the `exec` ends in an infinite loop, and furthermore brings a working answer for my case, which solves the problem. Thank you! – Unapedra Mar 04 '19 at 11:26
  • @Unapedra Yes, see below. – Wiktor Stribiżew Mar 04 '19 at 11:30

2 Answers2

3

This approach replaces all matching characters with a star *. Then, we iterate that replaced string and retrieve all indices which do not match the regex character class.

var str = "1111-253-asdasdas";
var pattern = /[^A-Za-z0-9]/g;
str = str.replace(pattern, "*");

var indices = [];
for(var i=0; i < str.length;i++) {
    if (str[i] === "*") indices.push(i);
}
console.log(indices.toString());

In this case, only characters at positions 4 and 8 do not match, because they are underscores.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

Reason

The infinite loop is easy to explain: the regex has a g modifier and thus tries to match multiple occurrences of the pattern starting each matching attempt after the end of the previous successful match, that is, after the lastIndex value:

See exec documentation:

If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property

However, since your pattern matches an empty string, and you do not check the condition if the index is equal to lastIndex, the regex cannot advance in a string.

Solution

Use a regex to match any non-alphanumeric chars, /[\W_]/g. Since it does not match empty strings the lastIndex property of the RegExp object will be changed with each match and no infinite loop will occur.

JS demo:

let match, indexes = [];
let reg = /[\W_]/g;
let str = "1111-253-asdasdas";

while (match = reg.exec(str)) {
    indexes.push(match.index);
}
console.log(indexes);

Also, see how to move the lastIndex property value manually.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563