2

Most of Node.js top modules I inspect always define their regexp in the module scope, outside the function using it.

For example, few lines taken from Busboy, the fastest multipart/form-data parser for Node.js:

var RE_SPLIT_POSIX =
      /^(\/?|)([\s\S]*?)((?:\.{1,2}|[^\/]+?|)(\.[^.\/]*|))(?:[\/]*)$/;
function splitPathPosix(filename) {
  return RE_SPLIT_POSIX.exec(filename).slice(1);
}

Beside re-usability, is there any speed benefits doing this rather than moving the regexp inside the function? Like that:

function splitPathPosix(filename) {
  return /^(\/?|)([\s\S]*?)((?:\.{1,2}|[^\/]+?|)(\.[^.\/]*|))(?:[\/]*)$/.exec(filename).slice(1);
}

I know that regexp are compiled for better performance. Does that mean the last code snippet needs to recompiled the regexp every time the function is executed? I would guess that most Javascript engine cache the compiled regexp.

I'm specifically interested in V8/Node.js here, but general knowledge about how other engines work can be interesting as well.

cronvel
  • 4,045
  • 2
  • 14
  • 19
  • This case is particularly interesting because the regex can match an empty string. The problem may arise with multiple matches when the `exec` is used in a loop, then you would need to manually move the `re.Index`. In this case, there should be no change since the whole string is matched. – Wiktor Stribiżew Sep 17 '15 at 10:07
  • 4
    The chances that defining the regexp inside your function vs. outside will make any observable difference in the performance of your application are negligible at best. –  Sep 17 '15 at 10:08
  • 2
    You could do an experiment with 100000 strings with both approaches, and see for youself :) However, it's probably not a [big deal](http://programmers.stackexchange.com/questions/80084/is-premature-optimization-really-the-root-of-all-evil) – Tholle Sep 17 '15 at 10:10
  • @stribizhev: It doesn't matter even without the anchors, because there's no `g` flag. `lastIndex` isn't set when the expression isn't global. E.g., `var x = /./; console.log(x.exec("foo")[0], x.lastIndex); console.log(x.exec("foo")[0], x.lastIndex);` logs `f 0` twice, not `f 1` then `o 2`. – T.J. Crowder Sep 17 '15 at 10:15
  • Thanks for the reply. I'm well aware of the danger of early micro-optimization, yet I'm still curious about how things work behind the scene. Thanks for finding out the duplicate ;) – cronvel Sep 17 '15 at 12:19
  • 1
    It should be noted that all of that path-related code (including those RegExps) came from node's own `path` module. I merely included the code directly for better backwards compatibility (node 0.10 does not have exported platform-specific `path` functions). – mscdex Sep 19 '15 at 14:24
  • Thanks for this response @mscdex. If there are no speed benefits at all, it should be just a matter of taste. – cronvel Sep 21 '15 at 08:18

0 Answers0