3

I've got a list of 100,000 items that live in memory (all of them big ints stored as strings).

The data structure these come in doesn't really matter. Right now they live in an array like so:

const list = ['1','2','3'...'100000'];

Note: The above is just an example - in reality, each entry is an 18 digit string.

I need to check for an object's existence. Currently I'm doing:

const needToCheck = '3';
const doesInclude = list.includes(needToCheck);

However there's a lot of ways I could do this existence check. I need this to be as performant as possible.

A few other avenues I could follow are:

  1. Create a Map with the value being undefined
  2. Create an object ({}) and create the keys of the object as the entries in list, then use hasOwnProperty.
  3. Use a Set()
  4. Use some other sort of data structure (a tree?) due to the fact that these are all numbers. However, due to the fact that these are all 18 digits in length, so maybe that'll be less performant.

I can accept a higher upfront cost to build the data structure to get a bigger speed increase later, as this is for a URL route that will be hit >1MM times a day.

Lee Taylor
  • 7,761
  • 16
  • 33
  • 49
JonLuca
  • 850
  • 6
  • 25
  • 2
    IMO `Set` is perfect choice for such case, read time for Set is `O(1)` – Code Maniac Sep 13 '19 at 01:42
  • Set or object with undefined. No need for a Map since they're all strings. – Jared Smith Sep 13 '19 at 01:45
  • “(all of them big ints stored as strings)” -> This is a terrible idea for performance. Try a `BigUint64Array` on applicable platforms instead, or some `Uint32Array`-based shim. – Veedrac Sep 13 '19 at 05:04

2 Answers2

3

Array.prototype.includes is an O(n) operation, which is not desirable - every time you want to check whether a value exists, you'll have to iterate over much of the collection (perhaps the entire collection).

A Map, Set, or object are better, since checking whether they have a value is an O(1) operation.

A tree is not desirable either, because lookup will necessarily take a number of operations down the tree, which could be an issue if the tree is large and you want to lookup frequently - so the O(1) solution is better.

A Map, while it works, probably isn't appropriate because you just want to see if a value exists - you don't need key-value pairs, just values. A Set is composed of only values (and Set.has is indeed O(1)), so that's the best choice for this situation. An object with keys, while it could work too, might not be a good idea because it may create many unnecessary hidden classes - a Set is more designed towards dynamic values at runtime.

So, the Set approach looks to be the most performant and appropriate choice.

You might also consider the possibility of moving the calculation to the server. 100,000 items isn't necessarily too much, but it's still a surprisingly large amount to see client-side.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
0

Unconventionally, you could also use an object and set each of your 100,000 items as a property because under the hood, the JavaScript Object is implemented with a hash table.

For example,

var numbers = {
"1": 1243213,
"2": 4314121,
"3": 3142123
...
}

You could then very quickly check if an item existed by checking if numbers["1"] === undefined. And not only that, but you can also get the value of of the property at the same time.

However, this method does come with some drawbacks like iterating through the list becoming a lot more complicated (though still possible).

For reference, see https://stackoverflow.com/a/24196259/8250558

Moffen
  • 1,817
  • 1
  • 14
  • 34