1

I tried to compare dict and if-else which is faster as follows.

d = {
    str  : lambda x: "'%s'" % str(x),
    int  : lambda x: str(x),
    float: lambda x: str(x),
}
items = ['a', 'b', 'c', 1, 2, 3, 4, 5, 1.0]

def use_dict():
    r = []
    for i in items:
        r.append(d[type(i)](i))
    return r

def use_if():
    r = []
    for i in items:
        if isinstance(i, str):
            r.append("'%s'" % str(i))
        elif isinstance(i, (int, float)):
            r.append(str(i))
    return r

if __name__ == '__main__':

    from timeit import timeit

    print 'use_dict:', timeit(use_dict)
    # -> use_dict: 9.21109666657

    print 'use_if  :', timeit(use_if)
    # -> use_if  : 10.9568739652

I found dict is faster than if-else. This means when I want to write a switch-statement, dict is better solution. But I have a doubt Why dict is faster? Anyone can explain it. Thanks.

Burger King
  • 2,945
  • 3
  • 20
  • 45
  • 4
    That isn't just a test of `dict` vs `if`, because at the same time you're testing `type(i) ==` vs `isinstance`. It might be better to test one thing at a time before drawing wide conclusions. – Dan Getz Jul 18 '15 at 06:14
  • 1
    Also, it looks like you might be trying to write yourself what `repr()` will already do for you. – Dan Getz Jul 18 '15 at 06:16

2 Answers2

6

If you want to get an idea of how your code executes, take a look at the dis module.

A quick example...

import dis

# Here are the things we might want to do
def do_something_a():
    print 'I did a'


def do_something_b():
    print 'I did b'


def do_something_c():
    print 'I did c'


# Case 1
def f1(x):
    if x == 1:
        do_something_a()
    elif x == 2:
        do_something_b()
    elif x == 3:
        do_something_c()


# Case 2
FUNC_MAP = {1: do_something_a, 2: do_something_b, 3: do_something_c}
def f2(x):
    FUNC_MAP[x]()


# Show how the functions execute
print 'Case 1'
dis.dis(f1)
print '\n\nCase 2'
dis.dis(f2)

...which outputs...

Case 1
 18           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               2 (==)
              9 POP_JUMP_IF_FALSE       22

 19          12 LOAD_GLOBAL              0 (do_something_a)
             15 CALL_FUNCTION            0
             18 POP_TOP
             19 JUMP_FORWARD            44 (to 66)

 20     >>   22 LOAD_FAST                0 (x)
             25 LOAD_CONST               2 (2)
             28 COMPARE_OP               2 (==)
             31 POP_JUMP_IF_FALSE       44

 21          34 LOAD_GLOBAL              1 (do_something_b)
             37 CALL_FUNCTION            0
             40 POP_TOP
             41 JUMP_FORWARD            22 (to 66)

 22     >>   44 LOAD_FAST                0 (x)
             47 LOAD_CONST               3 (3)
             50 COMPARE_OP               2 (==)
             53 POP_JUMP_IF_FALSE       66

 23          56 LOAD_GLOBAL              2 (do_something_c)
             59 CALL_FUNCTION            0
             62 POP_TOP
             63 JUMP_FORWARD             0 (to 66)
        >>   66 LOAD_CONST               0 (None)
             69 RETURN_VALUE


Case 2
 29           0 LOAD_GLOBAL              0 (FUNC_MAP)
              3 LOAD_FAST                0 (x)
              6 BINARY_SUBSCR
              7 CALL_FUNCTION            0
             10 POP_TOP
             11 LOAD_CONST               0 (None)
             14 RETURN_VALUE

...so it's pretty easy to see which function has to execute the most instructions.

As for which is actually faster, that's something you'd have to check by profiling the code.

The if/elif/else structure compares the key it was given to a sequence of possible values one by one until it finds a match in the condition of some if statement, then reads what it is supposed to execute from inside the if block. This can take a long time, because so many checks (n/2 on average, for n possible values) have to be made for every lookup.

The reason that a sequence of if statements is more difficult to optimize than a switch statement is that the condition checks (what's inside the parens in C++) might conceivably change the state of some variable that's involved in the next check, so you have to do them in order. The restrictions on switch statements remove that possibility, so the order doesn't matter (I think).

Python dictionaries are implemented as hash tables. The idea is this: if you could deal with arbitrarily large numbers and had infinite RAM, you could create a huge array of function pointers that is indexed just by casting whatever your lookup value is to an integer and using that as the index. Lookup would be virtually instantaneous.

You can't do that, of course, but you can create an array of some manageable length, pass the lookup value to a hash function (which generates some integer, depending on the lookup value), then % your result with the length of your array to get an index within the bounds of that array. That way, lookup takes as much time as is needed to call the hash function once, take the modulus, and jump to an index. If the amount of different possible lookup values is large enough, the overhead of the hash function becomes negligible compared to those n/2 condition checks.

(Actually, since many different lookup values will inevitably map to the same index, it's not quite that simple. You have to check for and resolve possible conflicts, which can be done in a number of ways. Still, the gist of it is as described above.)

Community
  • 1
  • 1
fred1234
  • 429
  • 1
  • 4
  • 26
5

I am guessing it would be because for cases where the element in items is not a string, The if..elif solution actually ends up calling isinstance() function twice, which may be adding to the cost. And function calls in python are costly.

Whereas your dict solution only calls type() once in all cases.

As an example , I converted the list of items to all strings, and the if..elif solution was faster that time -

d = {
    str  : lambda x: "'%s'" % str(x),
    int  : lambda x: str(x),
    float: lambda x: str(x),
}
items1 = ['a', 'b', 'c', 1, 2, 3, 4, 5, 1.0]
items = ['a','b','c','d','e','f','g','h','i','j']

def use_dict():
    r = []
    for i in items:
        r.append(d[type(i)](i))
    return r

def use_if():
    r = []
    for i in items:
        if isinstance(i, str):
            r.append("'%s'" % str(i))
        elif isinstance(i, (int, float)):
            r.append(str(i))
    return r

if __name__ == '__main__':

    from timeit import timeit

    print('use_dict:', timeit(use_dict))

    print('use_if  :', timeit(use_if))

Result of running it on all strings -

C:\Users\anandsk>python a.py
use_dict: 7.891252114975529
use_if  : 6.850442551614534
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176