Why does my 8-puzzle solution runs faster when I create an array twice

Question

I wrote an algorithm to solve N-puzzle problem using breadth-first search. In trying to make things faster I decided to allocate a large array up front instead of repeatedly pushing and shifting values to an empty array.

By chance I noticed an odd behavior where allocating the large array twice actually made the wall clock time faster. I created a gist with the complete code but the part that's giving me odd behavior is here:

var values = new Array(1000000);

function breadthFirstSearch(state, goalState) {
  values = new Array(1000000);
  // implementation of a breadth first search for the n-puzzle problem
}

The typical runtime when values is initialized twice using node.js is about 550ms on my machine. When I comment out one of the places where I am instantiating values so I am only instantiating it once, the runtime increases to about 650ms - 700ms. It seems to me like the runtime should decrease when only allocating the array one time. I created a video to explain what I am doing and that shows the run times on my machine here.

Oddly, when I ran it on repl.it the runtime is about the same whether it's commented out or not, which makes me think it has something to do with the v8 engine.

Can anyone give me an explanation why the wall clock time increases when doing what should be less work?

Is it faster commented out or slower? It's not clear from the question. — slebetman, Jun 17 '15 at 22:11
@slebetman "What I noticed was that if I create an array of about 1000000 undefined elements myself twice, the algorithm is faster with about 30-100 ms depending on the machine you using.", comment out line 119 `values = new Array(1000000);`, therefore commented out must mean it's faster. — Marko Gresak, Jun 17 '15 at 22:12
Okay, I see. But you expressed yourself poorly, I interpreted it as it's faster if you initialize array beforehand, which would make sense in languages like C, but not in JS due to array constructor implementation. — Marko Gresak, Jun 17 '15 at 22:18
@AbimbolaIdowu btw use [`console.time`](https://developer.mozilla.org/en-US/docs/Web/API/Console/time) to measure time. Works in all modern VMs. — Marko Gresak, Jun 17 '15 at 22:19
@AbimbolaIdowu: Just a friendly reminder. Most of the best answers on this site come from full-time professional programmers. And most of us are married with kids. So a lot of the really good answers come from people who don't have time to debug other people's questions. Therefore it helps if you post code, state the result and state your expectations so people can answer questions without downloading and executing anything. — slebetman, Jun 17 '15 at 22:22
Thanks @MarkoGrešak for the suggestion, I would edit accordingly — hisabimbola, Jun 17 '15 at 22:27
[This answer](http://stackoverflow.com/a/7375418/1276128) is probably what you want. I wouldn't mark this as a duplicate, but reading it will probably give you some idea on what is happening. — Marko Gresak, Jun 17 '15 at 22:28
@MarkoGrešak from that post you shared, the runtime is meant to be slower because I am doing it twice, but it's faster, that's what is baffling me. — hisabimbola, Jun 17 '15 at 22:34
I am now lost, what are you trying to say? Is it faster if you initialize it as `new Array(...)` and then assign values, or if you assign the values as you go? I really don't have the time to go through and understand your 200SLOC program. Make a simple example which illustrates the part that amazes you. — Marko Gresak, Jun 17 '15 at 22:40
@MarkoGrešak: No, none of the reasons mentioned in there could make up for a runtime difference of 30 to 100ms. — Bergi, Jun 17 '15 at 22:59
@Bergi It explains the basics, but not the whole reason of what OP is experiencing, therefore I said it shouldn't be marked as a duplicate. But I'm not going to try to understand all that code just to give an explanation on JS array basics. — Marko Gresak, Jun 17 '15 at 23:20
@MarkoGrešak I created a short video to explain this, I hope this helps https://youtu.be/KzmYH5zBXL4 — hisabimbola, Jun 17 '15 at 23:26
@AbimbolaIdowu thanks for the video, it's simpler to understand. I do not have an explanation for this unfortunately, other than the fact that it's faster might not be true after all. I have created a gist: https://gist.github.com/markogresak/89f8d318e1f8ccb5187c — Marko Gresak, Jun 18 '15 at 00:35
I have updated the gist with times. On my computer, running in the same environment, all calls to *uncommented*, i.e. initializing array twice, takes more time compared to calls to *commented*, as you would expect. I will add explanation in next comment since it's too much to fit in a single comment. — Marko Gresak, Jun 18 '15 at 01:16
Now the explanation of what you were showing in the videos is just by chance, you have to average multiple measurements to get more accurate data. As you probably know, a lot of factors affect execution time and when you have 450-550ms execution time (20% tolerance), you can't really conclude that 30 - 100ms faster means that the uncommented code actually runs faster, since the deviation still falls in the tolerance range. I hope this explains your problem and I hope you have learned that you need to take multiple measurements and average those in order to get more accurate data. — Marko Gresak, Jun 18 '15 at 01:16
@MarkoGrešak: Thanks, please post that as an answer (or vote to close as "not reproducable"). However, could you maybe try to swap the two `measureTime()` statements and measure again? Alternatively, duplicate them to have some "warmup runs"? — Bergi, Jun 18 '15 at 02:45
I mean just to swap the [two lines 260 and 261](https://gist.github.com/markogresak/89f8d318e1f8ccb5187c#file-index-js-L260) and run the tests again. Sometimes results depend on the order of the tests, especially in cases like this where we have so many confusing global variables. — Bergi, Jun 18 '15 at 02:50
Sorry I deleted my old comment because I realized what did you mean. If you look at the beginning, there is an `init` function, which (re)initializes all the values on each start. That is because as you said, all these confusing global states end in a state that cannot be restarted. So the simplest solution was to just reset everything. I have tried with switching the statements and doing other things that might affect the time, but it's always the same, commented is faster as it should be. — Marko Gresak, Jun 18 '15 at 02:55
@MarkoGrešak that's a great benchmark you've included, thanks a ton for putting that together. I'm now curious to see what his results are running your benchmark on his machine. Also, which version of io.js where you using? I have a feeling Abimbola's is not up to date. Please add your benchmark as an answer and get some well deserved rep. — Nick Larsen, Jun 18 '15 at 03:03
Thanks @NickLarsen! I am using `v2.1.0`. I have also tested the same code with node.js `v0.12.0` and the patterns are same, only difference is that io.js is ~20% faster. — Marko Gresak, Jun 18 '15 at 03:07
This is a bit older, but I believe most of what is said in this talk still applies today for the most part. https://www.youtube.com/watch?t=515&v=mHtdZgou0qU I would spend some time watching this. — Norman Breau, Jun 18 '15 at 03:31

score 3 · Answer 1 · edited Jun 25 '15 at 18:45

I have written most of what will be in this answer in the comments, because at the time of writing those comments, this question was locked.

I have extended original code with better time measuring calls, which are using performance.now, which is available in modern browser or as node module performance-now. The code is uploaded as this gist.

What I've changed:

Added function init(), which (re)initializes all global variables, otherwise the program won't work across multiple runs.
Duplicated function breadthFirstSearch into breadthFirstSearchUncommented and breadthFirstSearchCommented, first one initializes values = new Array(1000000); on each call and the second one uses global variable values as it is, i.e. the statement mentioned before is commented / removed.
Modified function time to accept which BFS function should be used.
Added function measureTime, which calls time with given BFS function argument and measures min, max and average of runs (argument) measures. At the end, it outputs total time (sum of times for all time calls). Code for this function is below and of course on the gist, at the end of index.js file.
Removed add logging messages, only log messages left are the beginning and end of each measureTime call.

Function measureTime:

function measureTime(bfsFn, label, runs) {
  console.log('starting', runs, 'measures for', label);
  var diff = 0;
  var min = Number.MAX_VALUE, max = Number.MIN_VALUE;
  for (var i = 0; i < runs; i++) {
    init();
    var start = now();
    time(bfsFn);
    var d = now() - start;
    diff += d;
    min = Math.min(d, min);
    max = Math.max(d, max);
  }
  var avg = diff / runs;
  console.log('%s - total time for %d measures: %sms', label, runs, diff.toFixed(3));
  console.log('%s - avg time: %sms (%sms - %sms, Δt = %sms)',
    label, avg.toFixed(3), min.toFixed(3), max.toFixed(3), (max - min).toFixed(3));
}

The actual answer: The question is why was program which was defining new array twice, i.e. once at the beginning and once at each BFS function call, faster than when the initialization code was commented.

The answer is that there is no explanation for this. It's not a result of how JavaScript VMs work, it's not any JavaScript quirks, it's much simpler: The statement in question is actually false.

What OP was saying, and was later even kind enough to provide a video explanation, was true only for single runs. The most important reason is, as we all know, modern operating systems are doing a lot of work simultaneously -- I know this is not exactly true, but bear with me for the sake of this explanation. This means that it's very difficult to have exact same runtime in multiple runs, therefore we see some fluctuation in execution times among multiple runs.

To get more accurate measurements, I called function measureTime 100 times for each commented and another 100 times for uncommented function. This created a sample which minimizes the fluctuations.

I created table of comparison for different execution environments. All tests were ran on OS X Yosemite, using most recent stable versions of browsers and io.js as of 6/18/2015.

|   Environment   |  Uncommented time [ms]  |  Commented time [ms]  |
| :-------------- | :---------------------- | :-------------------- |
| Chrome          |         460.254         |        424.725        |
| Firefox         |         796.471         |        781.662        |
| Safari          |         529.492         |        474.580        |
| Node.js (io.js) |         411.804         |        394.807        |

This is the best table I could do since Stack Overflow doesn't support table markdown. For an actual table style, take a look at the gist.

In this table, it's clear that original statement saying code with uncommented array reinitialization is faster, as we can see that this is false for all four environments I tested.

The lesson to be learned here is do not make performance conclusions based on just few runs. Data of multiple test runs should be collected and be interpreted average of all values or in some cases, more sophisticated statistics methods should be used to get more accurate results.

Thanks for your answer and I am sorry for replying late, I was away. It is actually shocking to me how all things seem to be normal with the changes you made, because before asking this question here, I had tried it on several machines and I got the same response as was shown in the video though I did not try is multiple times like you did. I intend to look more into this, and see if I can get an explanation as to why it happened like that on my code and not the one you edited. I would get back to you if I find anything interesting. Thanks — hisabimbola, Jun 21 '15 at 04:54

Why does my 8-puzzle solution runs faster when I create an array twice

1 Answers1