1

Say I have millions of string IDs, I want to store them in a variable and check if one ID exists, there are both ways I can think of, list and dict:

Using list

>>> timeit_a = timeit.Timer('"9999999" in a', setup='a = [str(i) for i in range(3000000)]')
>>> timeit_a.timeit(1)
0.06293477199994868

Using dict

>>> timeit_b = timeit.Timer('"9999999" in b', setup='b = {str(i): None for i in range(3000000)}')
>>> timeit_b.timeit(1)
3.860999981952773e-06  # equal to 0.00000386099

As we can see using dict is much much much faster, but I feel creating the dict with bunch of Nones for the sake of just utilizing the hashmap of keys is not very elegant.

Is there a more canonical and more elegant way to do it?

James Lin
  • 25,028
  • 36
  • 133
  • 233

3 Answers3

6

If you have no values, use a set(), not a dict

{str(i) for i in range(30000)}

If you have millions of items, though, maybe offloading to Redis, for example, would be better for an application's memory usage / performance perspective

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Awesome, thanks! Never thought of that! – James Lin Dec 09 '20 at 02:53
  • 3
    I'd suggest an actual `set` comprehension, not `set()` constructor around a generator expression. `{str(i) for i in range(30000)}` will run faster (it avoids the overhead of saving and restoring generator state on yield). `set(map(str, range(30000)))` is probably faster than that, but best not to encourage `map` for trivial gains (it only gives gains in special cases; it's a pessimization any time you have to use a `lambda` that a comprehension could inline). – ShadowRanger Dec 09 '20 at 02:53
4

Definitely use a set. It is like a dict, but without the values, as it is not a mapping but a... set, surprisingly enough.

a = {str(i) for i in range(300000)} # one way of initializing a set
a = set()
for i in range(3000000):
    a.add(str(i)) # another way
Captain Trojan
  • 2,800
  • 1
  • 11
  • 28
1

You want a set. A set is basically a dict with no values. It's a collection of items with the performance of a dict lookup for asking if something is in the set.

timeit_b = timeit.Timer('"9999999" in b', setup='b = {str(i) for i in list(range(3000000))}')
timeit_b.timeit(1)
CryptoFool
  • 21,719
  • 5
  • 26
  • 44