I want to match all URLs in a string and use a regular expression for this purpose. On some strings, I encountered that the RegEx matching fails with a segmentation fault. Here is a failing example: .D0.9B.D0.B5.D1.82.D0.BE.D0.BF.D0.B8.D1.81.D1.8C.C2.A0.D1.80.D1.83.D1.81.D1.81.D0.BA.D0.BE.D0.B9.C2.A0.D0.B8.D1.81.D1.82.D0.BE.D1.80.D0.B8.D0.B8_20_.D0.B8.D1.8E.D0.BD.D1.8F
The RegEx applied:
let string = ".D0.9B.D0.B5.D1.82.D0.BE.D0.BF.D0.B8.D1.81.D1.8C.C2.A0.D1.80.D1.83.D1.81.D1.81.D0.BA.D0.BE.D0.B9.C2.A0.D0.B8.D1.81.D1.82.D0.BE.D1.80.D0.B8.D0.B8_20_.D0.B8.D1.8E.D0.BD.D1.8F";
this.regEx = new RegExp(
"(?:http(s)?:\/\/)?(?:www.)?(?:((?:[-a-z0-9]+[.])*){0,242}((?:[a-z2-7]{16}|[a-z2-7]{56})\.onion)(?::[0-9]{1,6})?)(\/(?:[^\"\'\s]+|$))?",
"ugi"
);
this.regEx.exec(string);
If I apply the RegEx using other techniques (e.g. on regexr.com) where it seems to work. Executed on Node.js (v8.11.3), it results in the following exception:
Segmentation fault (core dumped)
npm ERR! code ELIFECYCLE
npm ERR! errno 139
npm ERR! uriextractor@0.0.1 debugHost: `node --inspect-brk --use_strict extractor.js`
npm ERR! Exit status 139
npm ERR!
npm ERR! Failed at the uriextractor@0.0.1 debugHost script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /home/user/.npm/_logs/2018-06-27T12_15_04_809Z-debug.log
Now I am wondering: Does anybody have an idea where the problem might come from? And how should I debug such an issue?
As of now, I tried to simplify the regex and visualizing it, since I find it easier to see where it might take a wrong turn. The Chrome Dev tools are not much of a help when it comes to debugging issues with the RegEx itself it seems.