69

Why is looping through an Array so much faster than JavaScript's native indexOf? Is there an error or something that I'm not accounting for? I expected native implementations would be faster.

                For Loop        While Loop      indexOf
Chrome 10.0     50,948,997      111,272,979     12,807,549
Firefox 3.6     9,308,421       62,184,430      2,089,243
Opera 11.10     11,756,258      49,118,462      2,335,347   

http://jsben.ch/#/xm2BV

EscapeNetscape
  • 2,892
  • 1
  • 33
  • 32
Lime
  • 13,400
  • 11
  • 56
  • 88
  • 2
    This result becomes more questionable if you try it with strings. I ran some simple tests with a while loop http://jsperf.com/indexof-vs-while-loop feel free to improve the tests, but it's worth considering string comparisons and even larger arrays before jumping to conclusions. – Frug Mar 24 '14 at 20:46
  • 1
    Frug, you use a fixed order array and call it "random"... Relevant XKCD https://www.xkcd.com/221/ – Job Jan 18 '18 at 16:03
  • What do the numbers in the table represent? Is that the average time it took to complete the operation, or number of times the operation was completed in a given amount of time? Or something else? – Ben Sutton Nov 23 '19 at 21:19

6 Answers6

117

5 years from then, lot of changes happened in browsers. Now, indexOf performance has increased and is definitely better than any other custom alternative.

Version 49.0.2623.87 (64-bit)

Chrome Version 49.0.2623.87 (64-bit)

Pierre Maoui
  • 5,976
  • 2
  • 27
  • 28
  • 1
    thus proving the question is invalid. – a3.14_Infinity Mar 25 '16 at 10:49
  • 56
    @a3.14_Infinity outdated maybe not invalid – Lime Apr 02 '16 at 21:39
  • 1
    @pierre-maouni what tool did you use to get the results like that? or did you design it yourself? – Ayyash Oct 30 '17 at 06:50
  • @Ayyash, I've used jsperf.com – Pierre Maoui Nov 12 '17 at 17:20
  • 4
    Can you link to the test you used for this? The screenshot itself doesn't show the implementation, and given my results from , I'm getting drastically different results. – agweber Jun 15 '18 at 22:47
  • 2
    @agweber, this test create a "for loop" around indexOf and compare it to an other which includes an other for with a break... we are talking here of indexOf itself vs for/while – Pierre Maoui Nov 16 '18 at 13:43
  • 1
    Thank you @Lime. you are exactly correct. There is still a lot of information out on the web saying a for loop is fastest. I was scratching my head and not quite believing that, when I found this thread. And thank god I did! I almost reworked my solution because of the incorrect info still out there. This post is gold. – Courtney Foster Feb 15 '21 at 23:39
24

Ok, looking at the other benchmarks here I am scratching my head at the way that most developers seem to do their benchmarking.

Apologies, but the way it is done leads to horribly wrong conclusions, so I have to go a bit meta and give a comment on the answers provided.

What is wrong with the other benchmarks here

Measuring where to find element 777 in an array that never changes, always leading to index 117 seems so inappropriate for obvious reasons, that I have trouble explaining why. You can't reasonably extrapolate anything from such an overly specific benchmark! The only analogy I can come up with is performing anthropological research on one person, and then calling the findings a generalized overview of the entire culture of the country that this person lives in. The other benchmarks aren't much better.

Even worse: the accepted answer is an image without a link to the benchmark that was used, so we have no way to control if the code for that benchmark is correct (I hope it is a screenshot to a jsperf link that was originally in the question and later edited out in favour of the new jsben.ch link). It's not even an explanation of the original question: why one performs better than the other (a highly debatable statement to begin with).

First, you should know that not all benchmarking sites are created equal - some can add significant errors to certain types of measurements due to their own framework interfering with the timing.

Now, we are supposed to be comparing the performance of different ways to do linear search on an array. Think about the algorithm itself for a second:

  • look at a value for a given index into an array.
  • compare the value to another value.
    • if equal, return the index
    • if it is not equal, move to the next index and compare the next value.

That's the whole linear search algorithm, right?

So some of the linked benchmarks compare sorted and unsorted arrays (sometimes incorrectly labeled "random", despite being in the same order each iteration - relevant XKCD). It should be obvious that this does not affect the above algorithm in any way - the comparison operator does not see that all values increase monotonically.

Yes, ordered vs unsorted arrays can matter, when comparing the performance of linear search to binary or interpolation search algorithms, but nobody here is doing that!

Furthermore, all benchmarks shown use a fixed length array, with a fixed index into it. All that tells you is how quickly indexOf finds that exact index for that exact length - as stated above, you cannot generalise anything from this.

Here is the result of more-or-less copying the benchmark linked in the question to perf.zone (which is more reliable than jsben.ch), but with the following modifications:

  • we pick a random value of the array each run, meaning we assume each element is as likely to be picked as any other
  • we benchmark for 100 and for 1000 elements
  • we compare integers and short strings.

https://run.perf.zone/view/for-vs-while-vs-indexof-100-integers-1516292563568

https://run.perf.zone/view/for-vs-while-vs-indexof-1000-integers-1516292665740

https://run.perf.zone/view/for-vs-while-vs-indexof-100-strings-1516297821385

https://run.perf.zone/view/for-vs-while-vs-indexof-1000-strings-1516293164213

Here are the results on my machine:

https://i.stack.imgur.com/h0s4j.jpg

As you can see, the result changes drastically depending on the benchmark and the browser being used, and each of the options wins in at least one of the scenarios: cached length vs uncached length, while loop vs for-loop vs indexOf.

So there is no universal answer here, and this will surely change in the future as browsers and hardware changes as well.

Should you even be benchmarking this?

It should be noted that before you proceed to build benchmarks, you should determine whether or not the linear search part is a bottleneck to begin with! It probably isn't, and if it is, the better strategy is probably to use a different data structure for storing and retrieving your data anyway, and/or a different algorithm.

That is not to say that this question is irrelevant - it is rare, but it can happen that linear search performance matters; I happen to have an example of that: establishing the speed of constructing/searching through a prefix trie constructed through nested objects (using dictionary look-up) or nested arrays (requiring linear search).

As can be seen this github comment, the benchmarks involve various realistic and best/worst-case payloads on various browsers and platforms. Only after going through all that do I draw conclusions about expected performance. In my case, for most realistic situations the linear search through an array is faster than dictionary look-up, but worst-case performance is worse to the point of freezing the script (and easy to construct), so the implementation was marked as as an "unsafe" method to signal to others that they should think about the context the code would be used.

Jon J's answer is also a good example of taking a step back to think about the real problem.

What to do when you do have to micro-benchmark

So let's assume we know that we did our homework and established that we need to optimize our linear search.

What matters then is the eventual index at which we expect to find our element (if at all), the type of data being searched, and of course which browsers to support.

In other words: is any index equally likely to be found (uniform distribution), or is it more likely to be centered around the middle (normal distribution)? Will be find our data at the start or near the end? Is our value guaranteed to be in the array, or only a certain percentage of the time? What percentage?

Am I searching an array of strings? Objects Numbers? If they're numbers, are they floating point values or integers? Are we trying to optimize for older smartphones, up-to-date laptops, or school desktops stuck with IE10?

This is another important thing: do not optimize for the best-case performance, optimize for realistic worst-case performance. If you are building a web-app where 10% of your customers use very old smart phones, optimize for that; their experience will be one that is unbearable with bad performance, while the micro-optimization is wasted on the newest generation flagship phones.

Ask yourself these questions about the data you are applying linear search to, and the context within which you do it. Then make test-cases fitting for these criteria, and test them on the browsers/hardware that represents the targets you are supporting.

Job
  • 856
  • 10
  • 13
13

Probably because the actual indexOf implementation is doing a lot more than just looping through the array. You can see the Firefox internal implementation of it here:

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/indexOf

There are several things that can slow down the loop that are there for sanity purposes:

  • The array is being re-cast to an Object
  • The fromIndex is being cast to a Number
  • They're using Math.max instead of a ternary
  • They're using Math.abs
dreamlab
  • 3,321
  • 20
  • 23
  • The results are staggering.... I mean there not even close 0_0. Anyways Casting Array to an Object? Seriously it is already an Object. What is the motive there? It seems like an edge case. – Lime Jul 13 '11 at 17:37
  • Thinking about it more, the while loop successfully avoids needing the `Math.max` and `Math.abs`. Also with a little trickery you could probably add `fromIndex` support. – Lime Jul 13 '11 at 17:47
  • 1
    The code posted there is a polyfill, it's not the actual code in the implementation. – Barmar Apr 22 '16 at 21:19
  • @Barmar -- fair enough, but the polyfill tries to have a very similar implementation to the original and for the purposes of figuring out performance is generally a decent reference to look at. –  Apr 22 '16 at 23:00
  • The polyfill tries to have similar results, I doubt it's very similar to the implementation. In particular, the implementation can make use of efficient mechanisms internal to the Javascript implementation. – Barmar Apr 22 '16 at 23:02
  • 1
    @Barmar -- That wasn't the point I was making. The polyfill needs to do the same sanity checks that the internal implementation does, regardless of the specific mechanisms. For this purpose, it answers the original question of "why is native slower". –  Apr 22 '16 at 23:05
  • Anyway, those sanity checks are done just once, not in the loop. So they should have negligible impact on performance. – Barmar Apr 22 '16 at 23:07
  • @Barmar - jsperf is apparently down, so I can't check this, but given the 100M ops/second benchmarked, I'm guessing the original question is being run on a _very_ small array. You're right though, this eventually would become negligible. –  Apr 22 '16 at 23:11
5

indexOf does a bunch of type-checking and validation that the for loop and while loop ignore.

Here's the indexOf algorithm:

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/indexOf

Edit: My guess is indexOf is faster for big arrays because it caches the length of the array before looping through it.

shelman
  • 2,689
  • 15
  • 17
  • 1
    Yeah I had figured the same, but it doesn't appear any faster http://jsperf.com/js-for-loop-vs-array-indexof/10 And thats even a value that isn't in the array. – Lime Jul 13 '11 at 17:38
2

Run the test one more time with the edits I've made.

I've increased the size of the array, and made the index you're searching for larger as well. It seems in large arrays indexOf may be a faster choice.

http://jsben.ch/#/xm2BV

EDIT: Based on more tests, indexOf seems to run faster than a for loop in the version of Safari I'm using (5.0.3) and slower in just about everything else.

EscapeNetscape
  • 2,892
  • 1
  • 33
  • 32
Tyler Biscoe
  • 2,322
  • 3
  • 20
  • 33
  • 1
    The for loop is still faster in chrome. I'm curious what results you get with the while loop. I've merged the two here: http://jsperf.com/js-for-loop-vs-array-indexof/10 – Lime Jul 13 '11 at 17:33
1

It might be worth noting that if all you are trying to do is keep a list of items and check for existence (e.g. avoid adding duplicate IDs to an array), it would be far faster to keep an OBJECT with keys that reflect each ID. If you think I'm wrong, compare the following with an array + indexOf. We are talking 181ms for the object method vs. 1 MINUTE for the array indexOf method.

var objs = []
var i_uid = {} // method 1
var a_uid = [] // method 2
var total_count = 100000, idLen = 5
var ts, te, cObj = 0

// method 1
ts = new Date()
while (cObj < total_count) {
    var u = uid(idLen),
        o = {
            uid: u,
            text: 'something',
            created: new Date()
        }
    if (!i_uid[u]) { // ensure unique uids only
        objs.push(o)
        i_uid[u] = cObj // current array position as placeholder
        cObj++
    }
    else {
        console.log('unique violation [duplicate uid', u, ']')
    }
}
te = new Date()
console.log('loaded ' + total_count + ' with object method in', (te - ts), 'ms')

i_uid = {} // free-up
cObj = 0 // reset
objs = [] // reset

// method 2
ts = new Date()
while (cObj < total_count) {
    var u = uid(idLen),
        o = {
            uid: u,
            text: 'something',
            created: new Date()
        }
    if (a_uid.indexOf(u) == -1) { // ensure unique uids only
        objs.push(o)
        a_uid.push(u)
        cObj++
    }
    else {
        console.log('unique violation [duplicate uid', u, ']')
    }
}
te = new Date()
console.log('loaded ' + total_count + ' with array + indexOf method in', (te - ts), 'ms')

function uid(l) {
    var t = '',
        p = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
        pl = p.length
    for (var i = 0; i < l; i++)
        t += p.charAt(Math.floor(Math.random() * pl))
    return t
}
Jon J
  • 491
  • 5
  • 8