1

I'd like to be able to test the efficiency of an IF statement to a dictionary case statement hack in Python. Since there is no case statement I am currently using the dictionary method. It looks like so....

    self.options = {"0": self.racerOne,
        "1": self.racerTwo,
        "2": self.racerThree,
        "3": self.racerFour,
        "0f": self.racerFinish,
        "1f": self.racerFinish,
        "2f": self.racerFinish,
        "3f": self.racerFinish,
        "x": self.emptyLine,
        "t": self.raceTime,
        "G": self.letsGo,
        "CD": self.countDown,
        "C": self.emptyLine,
        }

Real world data will vary but I have a way to run a controlled test and that reads 688 lines of streaming data over 6.4 seconds.
I have also read this post: Python Dictionary vs If Statement Speed I am going to check out the cProfile method as well.

Does anyone have any other suggestions on how I can accurately measure an IF statement compared to the dictionary option? By efficient I guess that means using the least processing power and can keep up with the stream better.

Over those 6.4 seconds I read each line of streaming data, parse it, evaluate it, then visually display it in real time. I don't think there is going to be much different in running my application on a Win or OSX system but it also have to run on a Raspberry Pi where processing power is limited.

Thanks in advance.

Community
  • 1
  • 1
Husar
  • 413
  • 1
  • 6
  • 17
  • 1
    Have you actually tried running the code? Is it not fast enough? If not, you may be trying to fix a problem you don't have. – Kedar Mar 12 '15 at 03:45

1 Answers1

2

It sounds as though your major areas to optimize will not be this statement.

However, out of curiosity, I examined it anyway. The intuitive answer, which is given in the question you linked to, is that python dictionaries are implemented as hash tables. Lookup should scale at around O(1) with the number of items. If statements along what you showed will scale at O(n), as each will be done sequentially. Running 1000 random numbers through functions using each, with anywhere from 2 to 1000 choices, I got the following timing (y scale is in seconds per choice, and is log scale). If chains are blue, dict lookup is green. The x scale is in number of possible choices:

enter image description here

As can be seen, lookup is fast, and much faster than long if statement chains.

Even for short chains, in this code, lookup is still faster or around the same:

enter image description here

But note the times here. We're talking about times in the sub-microsecond range per choice on my computer: around 600ns for a low number of choices. At that point, the overhead may be from things as simple as function calls. On the other hand, if you have a huge number of possible choices, the best thing to use should be pretty clear.

The code for the above is below. It uses numpy to keep track of all the times. For simple timing issues like this, it's usually easiest to just use time.time() to get a time value before and after whatever you want to do. For something very fast, you'll need to loop through multiple times and take an average of the times.

I should add the caveat that the way I created the if statement chains was mildly evil. It's possible that in doing so (with an exec statement) the function was somehow not optimized in the same way: I'm not an expert on python internals.

import numpy as np
import time
def createdictfun(n):
    optdict = { x: 2*x for x in range(0,n) }
    def dictfun(x):
        return optdict[x]
    return dictfun

def createiffun(n):
    s="def iffun(x):\n"
    s+="  if x==0: return 0\n"
    for i in range(1,n):
        s+="  elif x=={}: return {}\n".format(i,i*2)
    s+="  else: raise ValueError\n"
    exec s
    return iffun
ntry=10
maxchoice=1000
trialsize=1000
ifvals=np.zeros((maxchoice,2))
dictvals=np.zeros((maxchoice,2))
ns=np.arange(1,maxchoice)
for n in ns:
    ift = np.zeros(ntry)
    dictt = np.zeros(ntry)
    vals=np.random.randint(0,n,size=trialsize)
    iffun = createiffun(n)
    dictfun = createdictfun(n)
    for trial in range(0,ntry):
        ts=time.time()
        for x in vals:
            iffun(x)
        ift[trial]=time.time()-ts
        ts=time.time()
        for x in vals:
            dictfun(x)
        dictt[trial]=time.time()-ts
    ifvals[n,0]=np.mean(ift)/trialsize
    ifvals[n,1]=np.std(ift)/trialsize
    dictvals[n,0]=np.mean(dictt)/trialsize
    dictvals[n,1]=np.std(dictt)/trialsize
    print str(n)+" ",
cge
  • 9,552
  • 3
  • 32
  • 51