1

Let's say I do this:

re = /cat/;
re = /cat/;

From reading Zakas' book about Javascript, it seems that when executing the second line, no new RegExp object is created in memory. Instead, the same one is pointed to by re. How does this work "under the hood"? Does Javascript somehow check for what is already stored in re? What if I'd written:

re = /cat/;
re = /cats/;

Surely then, a new RegExp object will be created in the second line? How does Javascript specifically decide to write a new object or to keep the existing one?

The section of the book that made me draw my conclusions says:

In ECMAScript 3, regular-expression literals always share the same RegExp instance, while creating a new RegExp via constructor always results in a new instance. Consider the following:

var re = null, i;
for (i=0; i < 10; i++){ 
    re = /cat/g;
    re.test(“catastrophe”); 
}

In the first loop, there is only one instance of RegExp created for /cat/, even though it is specified in the body of the loop. Instance properties (mentioned in the next section) are not reset, so calling test() fails every other time through the loop. This happens because the “cat” is found in the first call to test(), but the second call begins its search from index 3 (the end of the last match) and can’t find it. Since the end of the string is found, the subsequent call to test() starts at the beginning again.

By "first loop" he's referring to the one I posted.

Sahand
  • 7,980
  • 23
  • 69
  • 137
  • I don't know what the book says, but the spec doesn't mention anything about reusing existing patterns: https://www.ecma-international.org/ecma-262/8.0/index.html#sec-regexpcreate . At least if you run `var re = /cat/; var re_original = re; re = /cat/; console.log(re === re_original);`, in userland code you have two different values. – Felix Kling Sep 20 '17 at 22:36
  • https://stackoverflow.com/questions/8814009/how-often-does-javascript-recompile-regex-literals-in-functions might have some info – Isaac Sep 20 '17 at 22:38
  • So this must be an old thing then? From the book: >"In the first loop, there is only one instance of RegExp created for /cat/, even though it is specified in the body of the loop. Instance properties (mentioned in the next section) are not reset, so calling test() fails every other time through the loop. This happens because the “cat” is found in the first call to test(), but the second call begins its search from index 3 (the end of the last match) and can’t find it. Since the end of the string is found, the subsequent call to test() starts at the beginning again." – Sahand Sep 20 '17 at 22:41
  • I think I know what the author is referring to, but hard to tell without seeing the actual example they are referring to. – Felix Kling Sep 20 '17 at 22:45
  • See my edit, I've added the relevant code snippet. – Sahand Sep 20 '17 at 22:47
  • Running this in Chrome, Safari and Firefox gives me `10 true`. So it's not failing every other time. – Felix Kling Sep 20 '17 at 22:52
  • Same here, so this must be dated information? I have an old copy of the book, it's from 2012. – Sahand Sep 20 '17 at 22:54
  • Don't use stylized quotes (e.g. `“...”`). Use ASCII quotes instead (e.g. `"..."`). The stylized ones are a syntax error in JavaScript. – Patrick Roberts Sep 20 '17 at 23:01
  • I doubt Javascript ever worked the way he described. Either he's mistaken or you're misquoting the book. – Barmar Sep 20 '17 at 23:18
  • Added more context to the quote at the top of it. Make of it what you will. – Sahand Sep 21 '17 at 08:32
  • possible duplicate of [Do objects made by expression literals share a single instance?](https://stackoverflow.com/q/28183907/1048572) – Bergi Jul 08 '19 at 10:14

3 Answers3

3

Either the author is mistaken or Javascript has changed significantly since it was written, because that's not how it works now. See How often does JavaScript recompile regex literals in functions? for a number of answers that go into detail about this.

I suspect the author may have confused regexp compilation with RegExp objects. When the compiler sees a regexp literal, it can compile it once. Then it generates code that runs each time through the loop to create a new object that uses that compiled regexp to perform the matching. But each RegExp object has its own state.

Notice that he says he's describing EcmaScript 3. That's a very old edition of EcmaScript, originally published in 1999. EcmaScript 5 is from 2009 (ES4 was abandoned during development), and that's what most browsers have implemented for several years, with ES6 adoption being phased in during the past couple of years. Maybe ES3 behaved the way he describes, but more recent editions don't.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • He's basically saying what you're saying, except he says that should lead to `false`every other time in the for loop because the regex object is the same. See the extra text added to the top of the quote. – Sahand Sep 21 '17 at 08:34
  • No, he's not saying what I'm saying. He says they should be the same instance (that's the same as what I meant by "object"), but they shouldn't. – Barmar Sep 21 '17 at 14:55
  • Okay, fair enough. I guess the information is just dated then, or it's some kind of mistake, like you said. – Sahand Sep 21 '17 at 18:26
  • Can you point to where ES3/ES5 documents different behavior? Isn’t this a subtle breaking change between ES3 and ES5? – binki Jan 17 '18 at 21:32
  • @binki I wasn't able to find any such documentation, that's why I said maybe. I'm just assuming the author is correct about how it worked in ES3. But I also said he could have been mistaken. – Barmar Jan 17 '18 at 21:33
3

In modern JavaScript (ES5+), evaluating a RegExp literal is specified to return a new instance each time a regular expression literal is evaluated. In ES3, a JavaScript literal creates a distinct RegExp object for each literal (including literals with the same content) at parse time and each “physical” literal always evaluates to the same instance.

So, in both ES5 and ES3, the following code will assign distinct RegExp instances to re:

re = /cat/;
re = /cat/;

However, if these lines are executed multiple times, ES3 will assign the same RegExp object on each line. In ES3, there will be exactly two instances of RegExp. The latter instance will always be assigned to re after executing those two lines. If you copied re to another variable in the meantime, you will see that re === savedCopy.

In ES5, each execution will produce new instances. So each time those lines run, a new RegExp object will be created for the first line and then another new RegExp object will be created and saved to the re variable for the second line. If you copied re to another variable in the meantime, you will see that re !== savedCopy.

Specs

ECMAScript 3rd Edition (ECMA-262) ­­­§ 7.8.5 (p. 20) states the following (emphasis added on pertinent text):

7.8.5 Regular Expression Literals

A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned. The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp (section 15.10.4) or calling the RegExp constructor as a function (section 15.10.3).

ECMAScript 5.1 (ECMA-262) § 7.8.5 states the following (emphasis added on pertinent text):

7.8.5 Regular Expression Literals

A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp (see 15.10.4) or calling the RegExp constructor as a function (15.10.3).

This means that the behavior is specified differently between ES3 and ES5.1. Consider this code:

function getRegExp() {
    return /a/;
}
console.log(getRegExp() === getRegExp());

In ES3, that particular /a/ will always refer to the same RegExp instance and the log will output true because the RegExp is instantiated once “when it is scanned”. In ES5.1, every evaluation of /a/ will result in a new RegExp instance, meaning that creation of a new RegExp happens each time the code refers to it because the spec says that it is “converted to a RegExp object (see 15.10) each time the literal is evaluated”.

Now consider this expression: /a/ !== /a/. In both ES3 and ES5, this expression will always evaluate to true because each distinct literal gets a distinct RegExp object. In ES5 this happens because each evaluation of a literal always results in a new object instance. In ES3.1 this happens because the spec says “Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical.”.

This change in behavior is documented as an incompatibility with ECMAScript 3rd Edition in ECMAScript 5.1 (ECMA-262) Annex E:

Regular expression literals now return a unique object each time the literal is evaluated. This change is detectable by any programs that test the object identity of such literal values or that are sensitive to the shared side effects.

Old code may have been written to rely on the ES3 behavior. This would allow a function to be called multiple times to incrementally walk through matches in a string when the expression was compiled with the g flag. This is similar to how, in C, the non-reentrant strtok() method works. If you want the same effect with ES5, you must manually store the RegExp instance in a variable and ensure that the variable has a long enough lifetime since ES5 effectively gives you behavior like the reentrant strtok_r() method instead.

Optimization Bugs

Supposedly there are bugs in JavaScript implementations which result in RegExp object caching resulting in observable side effects which should be impossible. The observed behavior does not necessarily adhere to either the ES3 or ES5 specification. An example for Mozilla is given at the end of this post with the spoiler text and explanation that the bug is not observable when debugging but is observable when the JavaScript is running in non-debug optimized mode. The blog author wrote a comment saying the bug was still reproducible in stable Firefox as of 2017-03-08.

binki
  • 7,754
  • 5
  • 64
  • 110
  • 2
    I imagine the intent of the ES3 design was to allow you to write something like `while (result = /foo/g.exec(string))`, so that `/foo/` would be the same object each time through the loop and it would keep its state. But it probably caused more problems than it solved and they changed it incompatibly. – Barmar Jan 17 '18 at 22:38
1

Im not familiar with the book but this is how it works so far as I understand it.

The var statement creates new variables which have no type and attachs them to the local scope.

var re;
var i;

or

var re,i

The null statement produces a null type object that exists apart.

null

Assigning variables in a var statement just points it to that object but it does not become that object; they are separate things that share a relationship.

var re=null,i;  

Using a regex statement creates a new regex object which we may or may not assign to a variable.

/cat/g  

or

re=/cat/g

When i reproduce your example it only returns true once in firefox52, it never returns false, but if i assign the return value of the test to another variable, and log it, I get true ten times.

var re=null,i; 

for (i=0;i<10;i++){
    re=/cat/g;
    var x=re.test('catastrophe');
    console.log(x)}
//returns true ten times

I think that Zacas is explaining an eccentricity found in some browsers due to their implementation of javascript. Using a regex or any statement should create a new object every time but there are many things called javascript, and a lot of them will reuse objects as often as possible and occasionally lead to strange behaviour that is eventually fixed.

I hope that helps

  • Thanks for your answer. He's definitely not describing a quirk of any browser, he's referring to normal Javascript behaviour. I'm guessing Javascript has simply changed since the book was written in 2012. – Sahand Sep 21 '17 at 08:30