24

Given this function:

function doThing(values,things){
  var thatRegex = /^http:\/\//i; // is this created once or on every execution?
  if (values.match(thatRegex)) return values;
  return things;
}

How often does the JavaScript engine have to create the regex? Once per execution or once per page load/script parse?

To prevent needless answers or comments, I personally favor putting the regex outside the function, not inside. The question is about the behavior of the language, because I'm not sure where to look this up, or if this is an engine issue.


EDIT:

I was reminded I didn't mention that this was going to be used in a loop. My apologies:

var newList = [];
foreach(item1 in ListOfItems1){ 
  foreach(item2 in ListOfItems2){ 
    newList.push(doThing(item1, item2));
  }
}

So given that it's going to be used many times in a loop, it makes sense to define the regex outside the function, but so that's the idea.

also note the script is rather genericized for the purpose of examining only the behavior and cost of the regex creation

Community
  • 1
  • 1
jcolebrand
  • 15,889
  • 12
  • 75
  • 121
  • Could you take a look at my [answer](https://stackoverflow.com/a/32524171/3345644) and consider making it as the accepted one? The current accepted answer is outdated and doesn't hold practical value anymore since using `RegExp()` is known to be a bad practice long time ago. – Alexander Abakumov Aug 23 '21 at 19:07

4 Answers4

17

From Mozilla's JavaScript Guide on regular expressions:

Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.

And from the ECMA-262 spec, §7.8.5 Regular Expression Literals:

A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated.

In other words, it's compiled once when it's evaluated as a script is first parsed.

It's worth noting also, from the ES5 spec, that two literals will compile to two distinct instances of RegExp, even if the literals themselves are the same. Thus if a given literal appears twice within your script, it will be compiled twice, to two distinct instances:

Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical.

...

... each time the literal is evaluated, a new object is created as if by the expression new RegExp(Pattern, Flags) where RegExp is the standard built-in constructor with that name.

Community
  • 1
  • 1
BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
  • 4
    “each time the literal is evaluated” means that a new instance will be generated each time evaluation happens. Evaluation happens during runtime, not during parsing. So each time a function with a `RegExp` literal in it is executed, if that `RegExp` literal is used or assigned, it will be a new instance. I believe you are thinking of ES3 behavior where a new instance is created during parsing and each literal **always** evaluates to the same object (instead of evaluating to a new object). See https://stackoverflow.com/a/48310965/429091 – binki Jan 17 '18 at 22:27
16

The provided answers don't clearly distinguish between two different processes behind the scene: regexp compilation and regexp object creation when hitting regexp object creation expression.

Yes, using regexp literal syntax, you're gaining the performance benefit of one time regexp compilation.

But if your code executes in ES5+ environment, every time the code path enters the doThing() function in your example, it actually creates a new RegExp object, though, without need to compile the regexp again and again.

In ES5, literal syntax produces a new RegExp object every time code path hits expression that creates a regexp via literal:

function getRE() {
    var re = /[a-z]/;
    re.foo = "bar";
    return re;
}

var reg = getRE(),
    re2 = getRE();

console.log(reg === re2); // false
reg.foo = "baz";
console.log(re2.foo); // "bar"

To illustrate the above statements from the point of actual numbers, take a look at the performance difference between storedRegExp and inlineRegExp tests in this jsperf.

storedRegExp would be about 5 - 20% percent faster across browsers than inlineRegExp - the overhead of creating (and garbage collecting) a new RegExp object every time.

Conslusion:
If you're heavily using your literal regexps, consider caching them outside the scope where they are needed, so that they are not only be compiled once, but actual regexp objects for them would be created once as well.

Alexander Abakumov
  • 13,617
  • 16
  • 88
  • 129
  • 1
    You mention that ES5+ creates a new RegExp object each time the literal is evaluated. Did it work differently in earlier Javascript versions? See https://stackoverflow.com/questions/46332713/what-happens-when-i-set-the-same-variable-to-the-same-regex-value-in-multiple-st where the questioner quotes a 5-year-old book that claims that the regexp object will be reused when it's evaluated each time through a loop. – Barmar Sep 20 '17 at 23:25
  • 2
    This is the best answer I've seen regarding this topic. – Lonnie Best Jan 09 '19 at 13:57
  • 1
    The `jsperf` link in your answer does not seem to work anymore. Otherwise great answer! – Petr Srníček Dec 11 '19 at 22:42
  • 1
    This is the most complete answer. I came here looking to verify whether using a regex literal will fail to compare as equal in React, and therefore cause unnecessary renders. The conclusions from this is - yes, it could cause unnecessary renders. If you have a regex literal that you want to pass as a prop etc, it's simplest to create it once as a constant, outside of the React component. – mz8i Jun 16 '21 at 16:45
6

There are two "regular expression" type objects in javascript. Regular expression instances and the RegExp object.

Also, there are two ways to create regular expression instances:

  1. using the /regex/ syntax and
  2. using new RegExp('regex');

Each of these will create new regular expression instance each time.

However there is only ONE global RegExp object.

var input = 'abcdef';
var r1 = /(abc)/;
var r2 = /(def)/;
r1.exec(input);
alert(RegExp.$1); //outputs 'abc'
r2.exec(input);
alert(RegExp.$1); //outputs 'def'

The actual pattern is compiled as the script is loaded when you use Syntax 1

The pattern argument is compiled into an internal format before use. For Syntax 1, pattern is compiled as the script is loaded. For Syntax 2, pattern is compiled just before use, or when the compile method is called.

But you still could get different regular expression instances each method call. Test in chrome vs firefox

function testregex() {
    var localreg = /abc/;
    if (testregex.reg != null){
        alert(localreg === testregex.reg);
    };
    testregex.reg = localreg;
}
testregex();
testregex();

It's VERY little overhead, but if you wanted exactly one regex, its safest to only create one instance outside of your function

jermel
  • 2,326
  • 21
  • 19
  • By chance where did you get that RegExp.$1 syntax, as apparently it doesn't work correctly(?) in Chrome? – jcolebrand Jan 11 '12 at 16:15
  • [msdn](http://msdn.microsoft.com/en-us/library/windows/apps/9dthzd08(v=vs.94).aspx) it works in chrome/ff/ie even though i dont recommend using it, you should know it for completeness – jermel Jan 12 '12 at 01:15
  • Actually I found the part where it was reported not to work, and it was a legitimate bug in prior Chrome. – jcolebrand Jan 12 '12 at 16:15
5

The regex will be compiled every time you call the function if it's not in literal form.
Since you are including it in a literal form, you've got nothing to worry about.

Here's a quote from websina.com:

Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.

Calling the constructor function of the RegExp object, as follows:
re = new RegExp("ab+c")

Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.

Community
  • 1
  • 1
Joseph Silber
  • 214,931
  • 59
  • 362
  • 292
  • Interesting, do you have any sources of this behavior? – Kevin Ji Jan 11 '12 at 04:10
  • Is there any reason a clever engine could not see that the literal is constant and compile it just once? It seems a simple optimization, but I'm no expert. – Tikhon Jelvis Jan 11 '12 at 04:11
  • Yeah, that behavior makes more sense--a regex literal can't change but a string *can*, so it would *have* to be recompiled. – Tikhon Jelvis Jan 11 '12 at 04:22
  • 1
    @TikhonJelvis - Strings are immutable - can't be changed in place. Functions that "modify" strings all return a new string with the modified value. – jfriend00 Jan 11 '12 at 04:49
  • 1
    What I meant is that the value of the string might change between function calls: `new RegExp(someVariable + "blarg")`. – Tikhon Jelvis Jan 11 '12 at 04:51