2

I am currently lost in a javascript regex replace call.

The Problem

This medium sized regex

/import +{ *(?<export>([a-zA-Z0-9]+,?)+) *} +from +"(?<file>(\.{1,2}|([a-zA-Z0-9-_]+))((?=\/)\/[a-zA-Z0-9-_]+)*)"\s*;/g

Should match import statements in javascript file. For example

import {myFunc} from "./mylib";

This works when trying in this online tool (choosing the ECMA flavor) at least on initial load if i change something in the input text it looses the match and undoing the change does not help.

The behaviour with actual javascript is similarly weird as the regex does work if i call the replace with a simple string most of the time. Somtimes i reload the script and it does not work with the same input.

let working = text.replace(regex, 'replaced');

But when providing a "replacer function" it never works.

let not_working = text.replace(regex,function (_, g){return 'replaced';});

And always returns the original string.

What i have done so far

I looked into similar questions already like this where resetting the regex was the issue.

Also using '$1' to replace the capture groups didn't work work me as proposed here

My best guess

The random nature of the behaviour made me think if this might be a catastrophic backtrack issue, but then the code would stop executing, wouldn't it? Also when the 'regex101' tool does find the match it only takes 0.6 ms, which i think speaks against that.

Also an interesting note, it has to be tied to this specific regex as removing the last semicolon in it also makes it work 100% of the time.

/import +{ *(?<export>([a-zA-Z0-9]+,?)+) *} +from +"(?<file>(\.{1,2}|([a-zA-Z0-9-_]+))((?=\/)\/[a-zA-Z0-9-_]+)*)"\s*/g

I guess i could work with that, i just want to understand what leads to the described behaviour, to prevent future pitfalls. Especially why the regex works in one function, but fails in another, which i thought should behave the same.

Examples

Environment

Node Version : "node v18.2.0" Browser : "Version 102.0.5005.115 (Offizieller Build) (64-Bit)"

Edit 1

I was able to further simplify the problem as it only seems to have a problem with the 'file' part/"(?<file>(\.{1,2}|([a-zA-Z0-9-_]+))((?=\/)\/[a-zA-Z0-9-_]+)*)"\s*;/g More specifcally as @Amadan pointed out the lookahead (?=\/) seems to cause the issue as removing it fixes the issue.

Now i also realize that i actually wanted "conditional lookaheads" (?(?=\/)[a-zA-Z0-9-_]+|), but yeah it does not realy make sense here.

Still I would say the behaviour of the Javascript regex parser is unexpected.

ObjectName
  • 98
  • 1
  • 7
  • 1
    Isn't `(?=\/)\/` identical to `\/`? Not seeing why it doesn't work but I think removing the useless lookahead fixes it as well (on regex101). ALso, your codepen gives me "working : readline; not working : replaced;", so i can't seem to replicate your error (Chrome 102.0.5005.115 (Official Build) (64-bit)) – Amadan Jun 19 '22 at 03:06
  • 1
    Same on Node.js v18.3.0, working fine (or at least I could not reproduce the eror). You may want to specify which JS environment you are getting your error in just in case. Additionally, `([a-zA-Z0-9]+,?)+` is a catastrophic backtrack pattern, i believe: `ab` can be matched with one repetition of "two characters and an optional comma", or two repetitions of "a character and an optional comma". – Amadan Jun 19 '22 at 03:22
  • Thanks for having a look at it. Sorry, i seem to have accidentally overridden the codepen while still figuring out the issue myself the semicolon at the end of the regex was apparantly missing. this should be the regex: `/import +{ *(?([a-zA-Z0-9]+,?)+) *} +from +"(?(\.{1,2}|([a-zA-Z0-9-_]+))((?=\/)\/[a-zA-Z0-9-_]+)*)"\s*;/g` I have added it back to the codepen. As for `(?=\/)\/` i was trying to match a next element only if a slash is found. In other words after every slash there has to be a [a-zA-Z0-9-_] with at least one character not allowing "./" – ObjectName Jun 19 '22 at 10:24

0 Answers0