2

When processing a very large in-memory array of uniform (same-type) JavaScript objects (each with just a few columns)...

Is there any impact on performance and/or or any other penalty to consider when choosing to iterate through it as row->column versus column->row?

Example

We have 100,000 data rows from a CSV file, each is an object with 10 integer values, and we need to touch every value in it for a certain calculation.

Will it make any difference whether we iterate through it vertically or horizontally? Does the modern V8 care about such things in the slightest?

var data = [...]; // array of many same-type objects;

// horizontal iteration:
for (var p in data[0]) {
    for (var i = 0; i < data.length; i++) {
        var value = data[i][p]; // current value;
        // calculate here from the value;
    }
}

// vertical iteration:
for (var i = 0; i < data.length; i++) {
    for (var p in data[0]) {
        var value = data[i][p]; // current value;
        // calculate here from the value;
    }
}
vitaly-t
  • 24,279
  • 15
  • 116
  • 138
  • how can you iterate through an array by column without iterating through the rows on the inner loop? That doesn't make sense to me. The notion of column isn't well defined in javascript. So, yes, any method that I can imagine for iterating "by column" would be much slower than by row. Can you show some code to explain what you mean? – Christian Fritz Feb 13 '16 at 03:17
  • @ChristianFritz: If the members (properties/indices) of the inner structure are consistent, that can be defined in the outer loop. If the inner structure is an Object and the outer is an Array, I think it can make a performance difference to have the object on the outside since `for-in` iteration is usually slower than `for`. –  Feb 13 '16 at 03:21
  • At a guess, I'd assume that by row is probably faster. Given that the data is probably constructed by row and not by column, then the likelihood that the data for a row is stored in a contiguous block of memory (IE: More likely to have the rest of the row's data in cache already), I'd guess that it would be faster. Just a thought, may be wrong, but is what I'd be thinking. Overall, wouldn't expect much difference. – JosephGarrone Feb 13 '16 at 03:21
  • @ChristianFritz, I added an example to your question. – vitaly-t Feb 13 '16 at 03:21
  • I just saw the code edit, given that, and Squint's comment, I would choose the horizontal iteration, only need to loop `for-in` numColumns times, vs numColumns * numRows if using vertical. – JosephGarrone Feb 13 '16 at 03:23
  • @JosephGarrone it would be nice to understand the cons and pro-s in that... – vitaly-t Feb 13 '16 at 03:31
  • As vitaly-t said, `for-in` is slower, so it would follow that performing more `for` loops as opposed to `for-in` would speed up the process. Here is an SO answer on the speed - http://stackoverflow.com/questions/13645890/javascript-for-in-vs-for-loop-performance – JosephGarrone Feb 13 '16 at 03:32
  • @JosephGarrone I didn't say that, it was squint ;) – vitaly-t Feb 13 '16 at 03:35

3 Answers3

2

Only one way to be sure is to run some benchmarks, I made a jsperf: http://jsperf.com/vertical-vs-horizontal-loop The results depends on the engine like expected. Early results from tests I did (Chrome 42 is Edge on Window 10):

|     UserAgent    | horizontal iteration | vertical iteration | vertical iteration with caching | # Tests |
|:----------------:|:--------------------:|:------------------:|:-------------------------------:|:-------:|
| Chrome 42.0.2311 |         1,067        |         287        |               226               |    2    |
|   Firefox 43.0   |         5,621        |         415        |               443               |    2    |
|      IE 11.0     |          976         |         441        |               313               |    2    |
|  Iron 46.0.2450  |         1,557        |         901        |              1,907              |    2    |
(numbers are ops/s, the higher the better)

Interestingly enough, horizontal iteration ranges from twice faster to dozen of times faster (on Firefox). But vertical iteration with caching is the fastest only on Iron 46 (Chromium fork so V8 engine).

Benchmarks with node v5.1.0:

horizontal iteration x 1,140 ops/sec ±1.11% (63 runs sampled)
vertical iteration x 833 ops/sec ±0.92% (68 runs sampled)
vertical iteration with caching x 1,678 ops/sec ±1.13% (67 runs sampled)
Fastest is vertical iteration with caching

Shanoor
  • 13,344
  • 2
  • 29
  • 40
  • Your follow-up contradicts the result, i.e. you reversed it, by mistake, I believe. You meant to say vertical is faster, not horizontal. – vitaly-t Feb 13 '16 at 05:09
  • @vitaly-t No, without caching, horizontal is faster in all browsers/engines, if you add caching, vertical is fastest only with V8. The numbers are ops/s not timing so the higher, the better :) – Shanoor Feb 13 '16 at 05:14
  • Got it. I wrongfully assumed the figures were delays, with smaller being better, but it is the other way round. Good test! Thank you! – vitaly-t Feb 13 '16 at 05:17
0

Vertical in reverse is better for better memory utilisation and performance. I would say iterate through data array in reverse. And few small things to taken into consideration is incrementing i with i += 1 instead of i++ or ++i. But these would show very small performance benefits as you are iterating through a large array. If you ask me, I follow either of below approaches

  1. Use Async library's each function to iterate through data array and do operations. The advantages are they are truly asynchronous, so wont block the execution thread. I have used this library to compare two large arrays where each array operation includes drawing html to canvas and comparing. Async js did that job very well for me.
  2. Create worker (n) threads depending on user machine, then slice the large array into smaller chunks, feed workers with chunked arrays. Whichever worker completes first will pick another chunked array from queue. At the end you can aggregate the result. I have tested this approach myself. I tried to sort an array of 50K array items. With regular execution browser was blocked but with this approach it was able to complete perfectly fine. I even tried with 300K items. I would say, this is better approach if you are not supporting lower end browsers.
Exception
  • 8,111
  • 22
  • 85
  • 136
  • 2
    Why in reverse is better? Any link to read about it? I just want to understand why it is better. – vitaly-t Feb 13 '16 at 03:48
  • *"And few small things to taken into consideration is incrementing i with i += 1 instead of i++ or ++i. But these would show very small performance benefits"* - Why on earth would `+=1` give a performance benefit over `++`? – nnnnnn Feb 13 '16 at 03:56
  • @nnnnnn Check this link https://jsperf.com/for-loop-i-vs-i/4. Earlier Crockford's approach used to be faster. Now browsers seems to be improved in that area. And Zakas also mentioned to use `i += 1` instead of `i++` or `++i` in his design patterns book. – Exception Feb 13 '16 at 04:28
  • Modern JS engines are pretty smart about that sort of thing. Crockford doesn't recommend +=1 for performance reasons though. – nnnnnn Feb 13 '16 at 06:10
0

Vertical is obviously much better since you can cache data[i] and only look it up once before checking the properties.

// vertical iteration:
for (var i = 0; i < data.length; i++) {
    var obj = data[i] // <--- here
    for (var p in obj) {
        var value = obj[p]; // current value;
        // calculate here from the value;
    }
}
Oleg V. Volkov
  • 21,719
  • 4
  • 44
  • 68
  • "obviously much better" is subjective and not objective. The cost of assigning a variable for each iteration of the outer loop most likely completely negates any performance benefit you actually get. Calling `data[i]` several times is more efficient than assigning it to a variable once and calling that variable several times. – michael Feb 13 '16 at 04:24
  • @michael, how it can be "subjective" when it is plain and simple "not doing unnecessary operation" vs. "doing it repeatedly". "Not doing" will always objectively be faster. Eliminating work completely is the basics of optimization. – Oleg V. Volkov Feb 13 '16 at 10:16