1

I have the use case, where I need to allow for processing an arbitrary array of strings, by an arbitrary regex, which is created either by a regex literal, or through the new RegExp() constructor.

It all works fine, until the global g flag is used with capturing groups.

I read a few answers on SO, and the proposed solution is to use regex.exec(string) in a while loop, e.g. How do you access the matched groups in a JavaScript regular expression?, JavaScript regular expressions and sub-matches

I also talked about it on IRC, and was advised against the implementation all together:

but there's regexes that will segfault your engine, unless you're using spidermonkey.

So here is a corner case, try pasting it into a fiddle or plunker, or even the console, it just breaks:

var regexString = '([^-]*)';
var flags = 'ig';
var regex = new RegExp(regexString, flags);

var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];

arr.forEach(function(item) {
  var match;
  var inter = [];
  while (match = regex.exec(item)) {
    inter.push(match[0]);
  }
});

console.log(result);

I did tried it on regex101.com https://regex101.com/r/xG0cL4/1 It breaks even if I do it without the quantifier, i.e. /([^-])/g https://regex101.com/r/yT7sQ2/1

My question: what is the (correct|safe) way to process arbitrary regexes against arbitrary strings?

Community
  • 1
  • 1
wiherek
  • 1,923
  • 19
  • 25
  • `while(match = regex.exec(item)) {` looks like an infinite loop – Travis J Apr 09 '15 at 23:25
  • Isn't `result` always `[]`? What is the expected result? – Jason Cust Apr 09 '15 at 23:30
  • @TravisJ It's not an infinite loop. When you use the global flag, each call to `regex.exec` advances the pointer to the next match, and it returns `false` when there are no more. – Barmar Apr 09 '15 at 23:32
  • Its unclear what you mean by arbitrary regexes. Do you actually need to use a regex as a variable? – aebabis Apr 09 '15 at 23:35
  • 1
    I think the problem is that `[^-]*` matches the empty string before the `-`, and doesn't advance the pointer because the match is zero-length, so you get an infinite loop. Change to `[^-]+` and it should work. But your question isn't about this specific regexp, it's about how to use regexps in general, right? – Barmar Apr 09 '15 at 23:39
  • yes, this is a tool to be used in a small framework. – wiherek Apr 09 '15 at 23:39
  • yes, due to the nature of the use case, not this specific corner-case, I wonder if there is a simple way to prevent it from a failure. Perhaps passing the process to a separate thread (i.e. a worker), and killing it from the main thread if it does not return within a short time? Just thinking out loud. – wiherek Apr 09 '15 at 23:41

3 Answers3

1

It's not working because when '-' is reached, exec() fails to match the '-' character, but does match 0 characters (because of *), so it doesn't skip it, and thus it gets stuck. If you use -|([^-]*), then it will skip the '-' character. You will need to then check the 'match.index' property to see if you've reach the end.

Also, you should be adding match[1] not match[0] if your intent is to save the matched text.

This works:

var regexString = '-|([^-]*)'; // or better yet: '([^-]+)' will work also
var flags = 'ig';
var regex = new RegExp(regexString, flags);

var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];

arr.forEach(function(item) {
  var match;
  var inter = [];      
  while (match = regex.exec(item)) {
    if (match.index >= item.length) break;
    else if (match[1] !== void 0) inter.push(match[1]);
  }
});

console.log(result);

but why not use 'match()' instead?

var regexString = '[^-]+';
var flags = 'gi';
var regex = new RegExp(regexString, flags);

var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];

arr.forEach(function(item) {
  var inter = item.match(regex);
});

console.log(result);
James Wilkins
  • 6,836
  • 3
  • 48
  • 73
  • In general case, you can check the if `match[0].length == 0` then increment `lastIndex`: http://stackoverflow.com/a/26923233/1400768 In this case, there is no problem with your solution, since any string will match either branch. – nhahtdh Apr 10 '15 at 06:14
0

The match object has an index property that contains the position of the current match. If that stays the same between two calls in the loop, it means you're stuck.

var regexString = '([^-]*)';
var flags = 'ig';
var regex = new RegExp(regexString, flags);

var arr = ['some-property-image.png', 'another-prop-video.png', 'y-no-work.bmp'];
var result = [];

arr.forEach(function(item) {
  var match;
  var inter = [];
  var lastIndex = -1;
  while (match = regex.exec(item)) {
    if (match.index == lastIndex) {
      break;
    }
    lastIndex = match.index;
    inter.push(match[0]);
  }
  result.push(inter);
});

console.log(result);
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

Why it's crashing: an infinite loop was created by using a RegExp object with a global flag and exec in a loop with a zero-width match. The loop hits the first '-' but doesn't match that character because of the negated character class. It then reverts back to the zero-length match and so doesn't advance the index value for exec. This means on the next loop it starts back at the same spot and does the exact same thing...infinitely.

That said, it's hard to tell what exactly you want but why not try match? It looks like you only care about the matching string so exec seems to be a bit of overkill.

If the desired output is a one-to-one array of results with the input array:

function foo(regexp, strings) {
  return strings.reduce(function(matches, str) {
    matches.push(str.match(regexp));
    return matches;
  }, []);
}

foo(/([^-]+)/ig, arr);
// outputs: [["some","property","image.png"],["another","prop","video.png"],["y","no","work.bmp"]]

foo(new RegExp('([^-]+)', 'ig'), arr);
// outputs: [["some","property","image.png"],["another","prop","video.png"],["y","no","work.bmp"]]

Even with the zero-width matching it won't go into an infinite loop:

foo(/([^-]*)/ig, arr));
// outputs: [["some","","property","","image.png",""],["another","","prop","","video.png",""],["y","","no","","work.bmp",""]]

If the desired output really is one array of all the matches:

function foo(regexp, strings) {
  return strings.reduce(function(matches, str) {
    return matches.concat(str.match(regexp));
  }, []);
}

foo(/([^-]+)/ig, arr);
// outputs: ["some","property","image.png","another","prop","video.png","y","no","work.bmp"]

foo(new RegExp('([^-]+)', 'ig'), arr);
// outputs:  ["some","property","image.png","another","prop","video.png","y","no","work.bmp"]

foo(/([^-]*)/ig, arr));
// outputs: ["some","","property","","image.png","","another","","prop","","video.png","","y","","no","","work.bmp",""]
Jason Cust
  • 10,743
  • 2
  • 33
  • 45