Imagine we have an array of unique integers. Given an integer (N
) of that list, I want to be able to get its index (I
) in the array as fast as possible.
My idea was to generate an object that given N
returns I
. I though of using either a structured array with data type (N,I)
and sorted by N
or simply a dictionary with keys N
.
The search speeds of both methods seems independent of the size of the object which leads me to believe that they are controlled by the overheads. However, I was a bit surprised to find that searching the dictionary was almost 10x faster than searching the structured array. So my questions are:
- Why is the dictionary so much faster than my array implementation?
- Is there an alternative method that is even faster than these two?
MWE:
from __future__ import division
import numpy as np
import timeit
#Time a function
def Timeme(funct,var,NN=10,NNN=10):
for i in xrange(NN):
start =timeit.default_timer()
for t in xrange(NNN):
funct(*var)
end =timeit.default_timer()
print str(i)+': '+str((end - start)/NNN*1000)
#Function to build a dictionary
def mydict(Flist):
Mydict=dict()
for n,i in Flist:
Mydict[n]=i
return Mydict
#Functions to access the data
def myfd(Mydict,vtest):
return Mydict[vtest]
def myfs(Flist,vtest):
n=Flist['N'].searchsorted(vtest)
return Flist['I'][n] #Flist[n]['I'] is slower
#N=100000
N=100
# "Allocate empty structured array"
Flist=np.empty(N,dtype=[('N','i4'),('I','i4')])
# "Fill N with randoms and I with sequence"
Flist['N'] = np.random.randint(N*1000,size=N)
Flist['I'] = np.arange(N)
# "Create test value"
ntest=np.random.randint(N)
vtest=Flist['N'][ntest]
# "Sort array on N"
Flist.sort(order='N')
# "Make dictionary"
Mydict=dict(Flist)
# "Get values"
nrd=myfd(Mydict,vtest)
nrs=myfs(Flist,vtest)
print "Tests OK: " + str(ntest == nrd and ntest == nrs)
print "\nSearch with Dictionary:"
Timeme(myfd,[Mydict,vtest],NN=5,NNN=100)
print "\nSearch directly in Array:"
Timeme(myfs,[Flist,vtest],NN=5,NNN=100)
Result:
Tests OK: True
Search with Dictionary:
0: 0.000404204885682
1: 0.000409016848607
2: 0.000418640774457
3: 0.000404204885682
4: 0.000394580959833
Search directly in Array:
0: 0.00455211692685
1: 0.00465798011119
2: 0.00458580066732
3: 0.00464354422242
4: 0.00476384329554