I would like to have a quick index lookup when using a pandas dataframe. As noted here and there, I understand that I need to keep my index unique, otherwise all hope is lost.
I made sure that the index is sorted and unique:
df = df.sort_index()
assert df.index.is_unique
I measured the lookup speed:
%timeit df.at[tuple(q.values), 'column']
1000 loops, best of 3: 185 µs per loop
Then, when I moved the index to a separate python dictionary:
index = {}
for i in df.index:
index[i] = np.random.randint(1000)
assert len(index) == len(df.index)
I got a huge speedup:
%timeit index[tuple(q.values)]
100000 loops, best of 3: 2.7 µs per loop
Why is it so? Am I doing something wrong? Is there a way to replicate python dict's speed (or something in <5x range) in a pandas index?