8

After reading many similar questions:

I still have a question: suppose I have a large array of strings (several thousands), and I have to make many lookups (i.e. check many times whether a given string is contained in this array). What is the most efficient way to do this in Node.js ?

A. Sort the array of strings, then use binary search? or:

B. Convert the strings to keys of an object, then use the "in" operator

?

I know that the complexity of A is O(log N), where N is the number of strings.

But I don't know the complexity of B.

IF a Javascript object is implemented as a hash table, then the complexity of B is, on average, O(1), which is better than A. However, I don't know if a Javascript object is really implemented as a hash table!

Community
  • 1
  • 1
Erel Segal-Halevi
  • 33,955
  • 36
  • 114
  • 183
  • an object property lookup will be almost instant in V8, and there is no binary search. you can use [].filter(/./.test, /^bob$/i) to find string elements of "bob", which is faster than user-land search code... – dandavis Jun 12 '13 at 20:30
  • @apsillers I edited my question to make it clear that I am interested mainly in the Node.JS implementation. – Erel Segal-Halevi Jun 12 '13 at 20:35
  • 1
    @dandavis I'm almost certain that a simple `Array.indexOf()` is faster then that. – basilikum Jun 12 '13 at 20:45
  • It would be very surprising if any JS implementation did not use a hash, or something comparably efficient, to implement objects. – Barmar Jun 12 '13 at 20:45
  • @basilikum - array lookup is lots slower than object property lookup. One is a linear search, the other is a hashed search. – jfriend00 Jun 12 '13 at 20:46
  • @basilikum - yes, indexOf is usually about 10-20X faster than regexp in node.js, i thought he needed a count of certain strings... – dandavis Jun 12 '13 at 20:48
  • @jfriend00 yes, I wasn't very clear. I meant `[].filter(...)` is slower than `ìndexOf`. – basilikum Jun 12 '13 at 20:53
  • 1
    @ErelSegalHalevi just play a little with this [fiddle](http://jsfiddle.net/BTrLH/2/) (preferably in Chrome) or do something similar directly in node.js. I only tested `indexOf` vs. object property lookups and yes, object property lookups are insanely fast and seem to perfom with O(1). – basilikum Jun 12 '13 at 20:56

2 Answers2

6

Update for 2016

Since you're asking about node.js and it is 2016, you can now use either the Set or Map object from ES6 as these are built into ES6. Both allows you to use any string as a key. The Set object is appropriate when you just want to see if the key exists as in:

if (mySet.has(someString)) {
    //code here
}

And, Map is appropriate when you want to store a value for that key as in:

if (myMap.has(someString)) {
    let val = myMap[someString];
    // do something with val here
}

Both ES6 features are now built into node.js as of node V4 (the current version of node.js as of this edit is v6).

See this performance comparison to see how much faster the Set operations are than many other choices.

Older Answer

All important performance questions should be tested with actual performance tests in a tool like jsperf.com. In your case, a javascript object uses a hash-table like implementation because without something that performs pretty well, the whole implementation would be slow since so much of javascript uses object.

String keys on an object would be the first thing I'd test and would be my guess for the best performer. Since the internals of an object are implemented in native code, I'd expect this to be faster than your own hashtable or binary search implemented in javascript.

But, as I started my answer with, you should really test your specific circumstance with the number and length of strings you are most concerned about in a tool like jsperf.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • 1
    Thanks, but I am interested mainly in the Node.JS implementation (I edited my question to reflect that), so jsperf.com is not relevant. – Erel Segal-Halevi Jun 12 '13 at 20:38
  • 3
    @ErelSegalHalevi - then design your own performance test in node.js or use the benchmarking engine from jsperf.com in node.js. node.js is the same javascript engine that's in Chrome so you'd get first order results by trying jsperf.com in the Chrome browser. If you want to then verify that in node.js, you will have to move your own benchmark test to node.js. This answer here won't get handed to you on a silver platter. If you really want to know which is faster, you have to test your particular circumstances. I'd suggest you use performance benchmarking tools that have already been built. – jfriend00 Jun 12 '13 at 20:45
2

For fixed large array of string I suggest to use some form of radix search Also, take a look at different data structures and algorithms (AVL trees, queues/heaps etc) in this package

I'm pretty sure that using JS object as storage for strings will result in 'hash mode' for that object. Depending on implementation this could be O(log n) to O(1) time. Look at some jsperf benchmarks to compare property lookup vs binary search on sorted array.

In practice, especially if I'm not going to use the code in browser I would offload this functionality to something like redis or memcached.

Andrey Sidorov
  • 24,905
  • 4
  • 62
  • 75
  • 1
    This test: http://jsperf.com/binary-search-versus-indexof-versus-hash-key-check/2 clearly shows that objects are much faster than binary search. – Erel Segal-Halevi Jun 13 '13 at 04:59