1

I am modifying a Node native extension that is spawning native threads to do some processing. My issue is that I'd like to have the Javascript code provide a filter for the processing to exclude some data.

At this point, I'm passing a JS RegExp string from JS to C++, creating a std::regex instance from it, and passing it around the different structures down to the native thread logic.

My issue now is that despite std::regex using what seems to be the same syntax as ECMAScript regular expressions, the behavior is not the same :(

My original plan was to rely on V8's RegExp engine somehow but trigger the C++ bits directly instead of going from C++ to JS and back. But I wasn't able to find how to do this.

As example, see the following programs using the same regex but yielding different results:

#include <stdio.h>
#include <regex>

int main() {
  std::regex re("^(?:(?:(?!(?:\\/|^)\\.).)*?\\/c)$");
  std::smatch match;
  std::string input("a.b/c");
  int result = std::regex_match(input, match, re);
  if (result == 1) {
    printf("ok");
  } else {
    printf("nok");
  }
  return 0;
}

The equivalent JS code:

const re = new RegExp("^(?:(?:(?!(?:\\/|^)\\.).)*?\\/c)$");
const match = re.exec("a.b/c");
if (match) {
  console.log("ok");
} else {
  console.log("nok");
}

My question then is: What can I do to get the same results I would in JS but in C++? Is it possible to run V8's RegExp from a pure C++ context?

WKnight02
  • 182
  • 1
  • 11
  • 2
    Most modern JS environments use ECMAScript 2018+ standards, C++ uses ECMAScript 262 (with some modifications related to the use of POSIX character classes). You can't expect JS regexps to work with `std::regex` – Wiktor Stribiżew Jul 16 '20 at 18:15
  • Thanks for clearing this up. So now the question is: what else can I use from a Node native addon perspective? – WKnight02 Jul 16 '20 at 18:21
  • 1
    Seems pretty much like one of `g++` bugs to me. I am getting `ok` on [`clang (v11.0.0)`](https://wandbox.org/permlink/6togo03DNNv8ab5R) and [`vc++ (v19.00.23506x86)`](https://rextester.com/TDUXH48750). – brc-dd Jul 16 '20 at 18:47
  • Good catch. I was running in WSL with `clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)`. I just tried with `clang version 10.0.0` on the actual Windows machine, and now I get `ok` as well... – WKnight02 Jul 16 '20 at 19:12
  • Since `/` is not a special regex metacharacter, why do you escape it `"\\/"` ? –  Jul 16 '20 at 19:25
  • @Maxt8r the regex I posted here comes from a tool that generates them based on some input glob (see "npm minimatch"). I don't know why `/` is escaped here, but I don't think this causes any issue? – WKnight02 Jul 16 '20 at 19:27
  • It's just an eyesore. It's called the _leaning toothpick syndrome_. It's no big deal but for clarity, the only thing in regex that should be escaped are metacharacters that are being literal - matched. It's annoying to read when diagnosing problems. –  Jul 16 '20 at 19:32
  • Your regex is _Identical_ to this `"^(?!(?:.+/)?\\.).*/c$"`. If you try that out with the suspect and non-suspect lang/compiler, etc –  Jul 16 '20 at 19:46
  • @Maxt8r Your regex works with the bogus compiler indeed. My issue is that I don't really have control on the input regexps... So yes, we can find something that works, but if what the tool gives me is bogus, what would I do? Hence why I was hoping someone new how to use v8's regexp engine outside of a JS isolate/context. – WKnight02 Jul 17 '20 at 15:33
  • `string from JS to C++` As far as I know the only interface from JS to C or assembly is the compiler that makes JS. –  Jul 17 '20 at 22:48
  • @Maxt8r I'm sorry but I don't understand your point? – WKnight02 Jul 20 '20 at 02:43
  • Did you say you're _passing_ a string from one code form to another and back ? Are you using Com+ ? Or are you saying you are modifying the JS source code and compiling regex in that C++ code ? –  Jul 20 '20 at 19:14
  • I'm saying that when making a native NodeJS addon (using node-gyp and all the fun stuff) I implement a C++ function which is called by JS. At this point, the C++ function has access to the parameters passed by JS, and right now said function looks for a string in the arguments, which it uses to create an `std::regex` instance. – WKnight02 Jul 20 '20 at 23:18

0 Answers0