Node.js RegEx segmentation fault on simple string

Question

I want to match all URLs in a string and use a regular expression for this purpose. On some strings, I encountered that the RegEx matching fails with a segmentation fault. Here is a failing example: .D0.9B.D0.B5.D1.82.D0.BE.D0.BF.D0.B8.D1.81.D1.8C.C2.A0.D1.80.D1.83.D1.81.D1.81.D0.BA.D0.BE.D0.B9.C2.A0.D0.B8.D1.81.D1.82.D0.BE.D1.80.D0.B8.D0.B8_20_.D0.B8.D1.8E.D0.BD.D1.8F

The RegEx applied:

let string = ".D0.9B.D0.B5.D1.82.D0.BE.D0.BF.D0.B8.D1.81.D1.8C.C2.A0.D1.80.D1.83.D1.81.D1.81.D0.BA.D0.BE.D0.B9.C2.A0.D0.B8.D1.81.D1.82.D0.BE.D1.80.D0.B8.D0.B8_20_.D0.B8.D1.8E.D0.BD.D1.8F";
this.regEx = new RegExp(
    "(?:http(s)?:\/\/)?(?:www.)?(?:((?:[-a-z0-9]+[.])*){0,242}((?:[a-z2-7]{16}|[a-z2-7]{56})\.onion)(?::[0-9]{1,6})?)(\/(?:[^\"\'\s]+|$))?",
    "ugi"
);
this.regEx.exec(string);

If I apply the RegEx using other techniques (e.g. on regexr.com) where it seems to work. Executed on Node.js (v8.11.3), it results in the following exception:

Segmentation fault (core dumped)
npm ERR! code ELIFECYCLE
npm ERR! errno 139
npm ERR! uriextractor@0.0.1 debugHost: `node --inspect-brk --use_strict extractor.js`
npm ERR! Exit status 139
npm ERR! 
npm ERR! Failed at the uriextractor@0.0.1 debugHost script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/user/.npm/_logs/2018-06-27T12_15_04_809Z-debug.log

Now I am wondering: Does anybody have an idea where the problem might come from? And how should I debug such an issue?

As of now, I tried to simplify the regex and visualizing it, since I find it easier to see where it might take a wrong turn. The Chrome Dev tools are not much of a help when it comes to debugging issues with the RegEx itself it seems.

You need to double the backslashes for `\.` and `\s` or else use a regexp literal. — Pointy, Jun 27 '18 at 14:41
I think a regex like yours is not the best way to do it. Even trying to match `(?:((?:[-a-z0-9]+[.])*){0,242})` on your string produces 174627 steps. Check if you can optimize some stuff. For example, `[.]` is just `.`. Check out for the quantifiers too. **Edit:** this can help : https://www.regular-expressions.info/catastrophic.html — Seblor, Jun 27 '18 at 14:47
Also that test string appears to have nothing whatsoever to do with the sort of patterns the regex is designed to match. I suspect the problem is with that `*` in the middle but it's hard to say. — Pointy, Jun 27 '18 at 14:48
@Seblor did you use some sort of tool to analyze that? I'm not questioning what you wrote; I'm just curious because a tool like that would be useful. *edit* oh awesome thanks. — Pointy, Jun 27 '18 at 14:49
@Pointy I use http://regex101.com for building and testing regular expressions. — Seblor, Jun 27 '18 at 14:49
@Pointy: I stripped down everything else that was working, I am not particularly interested in this string, except that it fails my regex ;) — jogli5er, Jun 27 '18 at 15:02
@Seblor: Agree, I'll try to break it further down. For the [.]: sorry, this must have happened while simplifying the regex... — jogli5er, Jun 27 '18 at 15:03
You really need to use `^` and move the `$` outwards because otherwise this matches `http://www.foo.onion@www.evil.com/` and `http://evil.com/#http://www.foo.onion/` while unnecessarily rejecting `http://www.foo.onion./`. Be very careful trying to filter URLs with regular expressions. Maybe use https://nodejs.org/api/url.html to validate URLs. — Mike Samuel, Jun 27 '18 at 15:15
Thank you all for your help, the hints about pulling out the quantifiers and verifying with regex101 helped much. Thank you also for pointing out other errors in the regex — jogli5er, Jun 27 '18 at 21:42

Node.js RegEx segmentation fault on simple string

0 Answers0