2

I have the following little example with the regex /-+|(?<=: ?).*. But this leads to an infinite loop in Node/Chrome and an "Invalig regex group"-error in Firefox.

When i change this to /-+|(?<=: ).*/gm (Leaving out the ?-quantifier in the lookbehind) it runs but - of course - i don't get the lines which contain no value after the :.

If i change the regex to /-+|(?<=:).*/gm (leaving the space out of the lookbehind) i again run into an infinite loop/error.

Can anyone explain this behaviour to me and what regex i would have to use to also match the lines which end on a colon? I'd love to understand...

const text = `
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
`;

const pattern = /-+|(?<=: ?).*/gm;

let res;
while((res = pattern.exec(text)) !== null)
{
    console.log(`"${res[0]}"`);
} 

EDIT:

The expected output is:

"-------------------------------------"
"5048603"
""
"asjhgg | a3857"
"Something..."
"-------------------------------------"
"5048603"
""
"asjhgg | a3857"
"Something..."
"-------------------------------------"
Juarrow
  • 2,232
  • 5
  • 42
  • 61

4 Answers4

3

The (?<=...) lookaround is a positive lookbehind and it is not yet supported in FireFox (see supported environments here), thus, you will always get an exception until it is implemented.

The /-+|(?<=: ?).* pattern belongs to patterns that may match empty strings, and this is a very typical "pathological" type of patterns. The g flag makes the JS regex engine match all occurrences of the pattern, and to do that, it advances its lastIndex upon a valid match, but in cases when the match is of zero length, it does not, and keeps on trying the same regex at the same location all over again, and you end up in the loop. See here how to move the lastIndex properly to avoid infinite loops in these cases.

From what I see, you want to remove all beginning of lines before the first : including : and any whitespaces after. You may use

text.replace(/^[^:\r\n]+:[^\S\r\n]*/gm, '')

Or, if you want to actually extract those lines that are all -s or all after :, you may use

const text = `
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
`;

const pattern = /^-+$|:[^\S\r\n]*(.*)/gm;

let res;
while((res = pattern.exec(text)) !== null)
{
    if (res[1] != undefined) {
      console.log(res[1]);
    } else {
      console.log(res[0]);
    }
}
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

try to use this pattern : /(.*):(.*)/mg

const regex = /(.*):(.*)/mg;
const str = `-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}
Mido elhawy
  • 469
  • 4
  • 11
0

Up front: Wiktor's answer is the answer to make it work cross-browser.

For anyone who is interested in how to get this to work in Chrome with the "original" pattern (thanks to Wiktor's answer, pointing out that the last index is not incremented on zero-matching):

const pattern = /-+|(?<=: ?).*/gm;

let res;
while((res = pattern.exec(text)) !== null)
{
    if(res.index === pattern.lastIndex)
        pattern.lastIndex++;
    console.log(`"${res[0]}"`);
}
Juarrow
  • 2,232
  • 5
  • 42
  • 61
-1

A Regex lookahead is defined like this (?=pattern) and not (pattern?)

https://www.regular-expressions.info/lookaround.html

John Caprez
  • 393
  • 2
  • 10
  • I'm not using (pattern?), i'm using (?<=pattern) which should be a lookbehind. (Fixed the question title, where i was saying 'lookahead') – Juarrow May 16 '20 at 22:14
  • Lookbehind is (?!pattern) – John Caprez May 16 '20 at 22:15
  • (?=foo) => lookahead, (?<=foo) => lookbehind, (?!foo) => negative lookahead, (?<!foo) => negative lookbehind – Juarrow May 16 '20 at 22:18
  • In PHP but not in JavaScript I think. Or only in a few browsers: https://caniuse.com/#feat=js-regexp-lookbehind – John Caprez May 16 '20 at 22:26
  • https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp (Lookahead does not seem to exist, lookbehind seems to be the same) – Juarrow May 16 '20 at 22:28