Efficiently accessing indices: is storing indices in a dict the fastest way?

Question

I have a list of tuples that I need to access the index of repeatedly. Is the fastest way to do this to store the tuples and their indices in a dict? Or is there a better way?

Given a tuple that exists in an ordered list of tuples, I need to access the index of it efficiently. I can think of two ways to do it; call .index on the list, or store the tuples as keys in a dictionary with the index as their value and look up the index by doing a dictionary lookup.

order = list(itertools.product(range(4),repeat=6))
%timeit order.index(random.choice(order))
# 10000 loops, best of 3: 35 µs per loop

order_dic = {i:order.index(i) for i in order}
%timeit order_dic[random.choice(order)]
# 1000000 loops, best of 3: 443 ns per loop

Are the indexes sequential numbers? Then use a list, otherwise a dict. — poke, Mar 31 '16 at 17:41
@poke I'm not sure what you're asking. They're indices, so they have to be sequential. Calling `.index` on a list is somewhat inefficient, though, and performing a dict lookup is faster. — C_Z_, Mar 31 '16 at 17:44
`list.index()` *searches* the list for a value, it’s not an index lookup. So you are comparing two very different things here. Please show what you’re actually trying to do; I can’t tell from your question. — poke, Mar 31 '16 at 17:49
@poke Given a tuple, I need to find out what its index is. I've explained it in the question. — C_Z_, Mar 31 '16 at 18:01

Tuomas Laakkonen · Accepted Answer · 2016-03-31T18:32:21.510

Almost always, the second option will be faster. This is because in the first example, you search the list in order to find the index, an O(n) operation on lists. Whereas in the second example, you lookup a value in the dictionary, which is O(1) on average.

However, space is something to consider here as the second option has the overhead of an additional dict. In Brandon Rhodes's talk at Pycon 2014, he says that dicts tend to use 3 to 4 times more space than they need, so that adding new elements is still fast, which could be a lot of overhead if you have a large list of tuples.

If list of tuples you use as an example is actually the list you are searching through, then there is a fast and memory-efficient way to calculate the index of a given tuple, relying on the way itertools.product works:

a[5] + a[4] >> 2 + a[3] >> 4 + a[2] >> 6 + a[1] >> 8 + a[0] >> 10

This will give the index of tuple a. This is nearly as fast as the second option but does not have the additional space requirement of an extra dict.

(The idea behind this method is that the tuple is essentially a six-digit base-four number, and since itertools.product outputs them in order, this will give the index.)

Another interesting option is `from collections import namedtuple`, if the index is static. Not sure how the attribute lookup compares to the index lookup, though. — Wayne Werner, Mar 31 '16 at 18:27
Thanks for the tip about calculating the indices based on the value using the properties of `itertools.product`; I might do that. Then again, the list of tuples I'm going to have isn't going to be terribly large (probably not more than a few thousand), so I might just stick with the dict lookup. — C_Z_, Mar 31 '16 at 18:30

score 0 · Answer 2 · edited May 23 '17 at 11:59

Looks like it will be really hard to get any faster than a dict(). Just see how long random.choice() takes without any lookup:

>>> order = list(itertools.product(range(4),repeat=6))
>>> order_dic = {i:order.index(i) for i in order}

>>> %timeit order.index(random.choice(order))
>>> %timeit order_dic[random.choice(order)]
>>> %timeit random.choice(order)

10000 loops, best of 3: 98.6 µs per loop
1000000 loops, best of 3: 1.6 µs per loop
1000000 loops, best of 3: 1.39 µs per loop

So, the lookup consists mere 15% of the total you were measuring.

But maybe you could consider storing the index in the tuple? Or storing the tuple in another tuple, that also has the index? For example order = list(enumerate(order)).

You could assign the index to each tuple object:

order = list(itertools.product(range(4),repeat=6))
order_dic = {i:order.index(i) for i in order}

%timeit order.index(random.choice(order))
%timeit order_dic[random.choice(order)]
%timeit random.choice(order)

class mytuple(tuple): pass

order = list(map(mytuple, order))
for i, t in enumerate(order): t.rev_index = i

%timeit random.choice(order).rev_index

## -- End pasted text --
10000 loops, best of 3: 101 µs per loop
1000000 loops, best of 3: 1.59 µs per loop
1000000 loops, best of 3: 1.44 µs per loop
1000000 loops, best of 3: 1.53 µs per loop

It's faster than the hash map.

You can't set attributes of builtins dynamically, so we have to subclass it. We also can't use __slots__ because variable-size types like tuple don't support it, so every mytuple will have an additional __dict__. But apparently this still has a smaller memory footprint (almost twice) than using one dict to map all the indexes:

>>> from pympler.asizeof import asizeof
>>> asizeof(order_dic) \
... / (asizeof(order) - asizeof(list(itertools.product(range(4),repeat=6))))
1.8341168569509738

I'm not sure what the lookup time of `random.choice()` has to do with my question. Also, I'm not sure what `order = list(enumerate(order))` is supposed to do; given a tuple that I need to find the index of, I would still need to search through the list to find the tuple, same as `.index()` — C_Z_, Mar 31 '16 at 18:37
@CactusWoman The `order = list(enumerate(order))` means that each element of of order will be a tuple with it's index in the list first and its value second, so you access the normal value with `a[1]` and then to get the index, you can simply use `a[0]` (for some element `a`). — Tuomas Laakkonen, Mar 31 '16 at 18:41
Ok, now I get what you're saying. That could work for the question as I've described it (although for my implementation it won't work because I'll be generating the tuples separately and then looking them up) — C_Z_, Mar 31 '16 at 18:45
@CactusWoman: See my updated answer. Maybe subclassing a tuple could work for you. — arekolek, Mar 31 '16 at 18:48

Efficiently accessing indices: is storing indices in a dict the fastest way?

2 Answers2