3

In my code, I have a list l, and I'm creating a dictionary of lists from it. (I'm grouping objects which have the same key). Implementing it via both a try statement and a if condition, I've noticed in line_profiler that the former seems much more efficient :

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================

   293  44378450     59805020      1.3     16.9       for element in l:

                                                        # stuff that compute 'key' from 'element'

   302   2234869      2235518      1.0      0.6              try:                                                         
   303   2234869     82486133     36.9     23.3                  d[key].append(element)                              
   304     57358        72499      1.3      0.0              except KeyError:                                             
   305     57358      1758248     30.7      0.5                  d[key] = [element]  

vs:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================

   293  44378450     60185880      1.4     14.0       for element in l:      

                                                        # stuff that compute 'key' from 'element'

   307   2234869     81683512     36.5     19.1              if key in d.keys():                                  
   308   2177511     76393595     35.1     17.8                  d.get(key).append(element)                       
   309                                                       else:                                                
   310     57358      1717679     29.9      0.4                  d[key] = [element]                               

I understand that with try, you only go into except when an exception is raised (so with few exceptions, it makes sense it cost less overall than testing the condition every time), but here, even the Time per hit is slower for the exception (1.3+30.7 µs) than for the test condition (36.5 µs). I thought raising exception would be more costly than checking if a key is in the dictionary (in just test the hash key, no ? It's not a line search). So why is that ?

tynn
  • 38,113
  • 8
  • 108
  • 143
Nihl
  • 557
  • 1
  • 5
  • 14
  • 1
    Is this Python 2.x ? – Anand S Kumar Aug 04 '15 at 08:49
  • It's Python 3.4. Why the downvote by the way ? – Nihl Aug 04 '15 at 08:52
  • 3
    The `if` version has to look up `key` twice, once in the `if` test and once in the `get`. And if this is Python 2 it'll be even slower, since `d.keys()` is a list, not a view, so look up involves a linear search rather than a set-like hash-based test. – PM 2Ring Aug 04 '15 at 08:55
  • 4
    The Pythonic way to do this is to use `d = defaultdict(list)`. Then `d[key].append(element)` would always work. – interjay Aug 04 '15 at 08:58
  • I thought that too, but I'm using 3.4, and `type(d.keys())` gives me `dict_keys`, so it's not a list. – Nihl Aug 04 '15 at 08:58
  • 1
    Ok, But you're still doing 2 look ups in the `if` version (if the key exists). FWIW, there's some good info about the relative speed of `if` vs `except` in [Using try vs if in python](http://stackoverflow.com/q/1835756/4014959) – PM 2Ring Aug 04 '15 at 09:01
  • True, it does the hash twice (I'm just a bit surprised a hash test took this much). Thanks for the link. – Nihl Aug 04 '15 at 09:04
  • [Branch prediction](http://stackoverflow.com/a/11227902/4871483) could also be a factor, yes? – rassa45 Aug 04 '15 at 10:02

4 Answers4

4

The additional runtime comes from the .keys() call. If you want to prevent that extra call and still stay with if and else, try something like this:

obj = d.get(key)
if obj:
      obj.append(element)
else:
      d[key] = [element]

Alternatively you can use a defaultdict which does this in the background. Example:

from collections import defaultdict
d = defaultdict(list)
d['123'].append('abc')
Klaus D.
  • 13,874
  • 5
  • 41
  • 48
3

try...except is slower only if the number of exceptions actually raised is comparable to the number of times the loop is executed. In your case, exceptions are only raised 2.5% of loop iterations.

Lets analyse the four scenarios below -

def func1():
    l = [1,2,3,4]
    d = {}
    for e in l:
        k = e - 1
        try:
            d[k].append(e)
        except KeyError:
            d[k] = [e]
    return d


def func2():
    l = [1,2,3,4]
    d = {}
    for e in l:
        k = e - 1
        if k in d.keys():
            d.get(k).append(e)
        else:
            d[k] = [e]
    return d

def func3():
    l = [1,2,3,4]
    d = {}
    for e in l:
        k = 1
        try:
            d[k].append(e)
        except KeyError:
            d[k] = [e]
    return d


def func4():
    l = [1,2,3,4]
    d = {}
    for e in l:
        k = 1
        if k in d.keys():
            d.get(k).append(e)
        else:
            d[k] = [e]
    return d

The timing results for this -

In [7]: %timeit func1()
The slowest run took 4.17 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.55 µs per loop

In [8]: %timeit func2()
1000000 loops, best of 3: 1.77 µs per loop

In [10]: %timeit func3()
The slowest run took 4.34 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 2.01 µs per loop

In [11]: %timeit func4()
The slowest run took 6.83 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.4 µs per loop
  1. In case of func1() and func2() , each element is going into a separate list, and hence for each key the try..except block raises and catches an exception. At that case, func2() is faster.

  2. In case of func3() and func4() , the exception is only thrown once, so the overhead of exception only occurs once, whereas the condition is still checked for each key (even if its present) , and that is why in that case, the try..except is faster.


I am guessing something similar may be occurring in your case, where the same key is computed multiple times, and hence its faster with the try..except block. You can check out how many actual elements are there in the list vs how many keys are there in the dictionary to find out if that could be the reason.

Assuming the hits column is the number of times that particular line was executed, you can see that , the line -

d[key].append(element)

Was executed 2234869 times, whereas exception was raised only - 57358 times , Which is only 2.56% of the total number of elements.

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
1

You should think that at every iteration you loose some time testing if condition checks. If you don't raise an exception your code will be faster with try except, otherwise it takes a little bit more time to treat the exception.

In other word, if you are sure that your exception is extraodinary (it happens only in exceptional cases) it's cheaper for you to use try-except. Otherwise, if you except an exception in ~50% of cases it's better for you to use if.

("it's easier to ask for forgiveness than permission")

dskfdskjgds
  • 341
  • 3
  • 14
0

I learned in one of my classes that the processor tries to rpedict and load in memory the instructions. If you put a if statement, the processor does not know which code to load and there is a waste of time loaing/unloading instructions etc.

Probably, in case of an exception, the processor assum tat the exception is rare and loads the main code without considering the exception...

I don't know f it is the case here, but that is what we learned in code optimizartion class.

source(class https://www.etsmtl.ca/etudes/cours/gti320/)

  • 1
    As the first answer explains, the additional time comes from the `.keys()`. while your point is also valid, it has a much smaller effect then the `.keys()`. – Kesslwovv Nov 15 '21 at 20:15