2

This came up in a nitpick discussion about the prefered style for iterating over dictionary keys if you need to apply some test to the value. I was comparing the performance of [k for k in d if d[k] == 1] against [k for k, v in d.items() if v == 1].

Looks like dictionary lookups are more expensive in Python 3.4:

$ python -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k in d if d[k] == 1]'
1000000 loops, best of 3: 2.17 usec per loop

$ python -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k, v in d.items() if v == 1]'
1000000 loops, best of 3: 3.13 usec per loop

$ python3.4 -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k in d if d[k] == 1]'
1000000 loops, best of 3: 3.25 usec per loop

$ python3.4 -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k, v in d.items() if v == 1]'
1000000 loops, best of 3: 3.05 usec per loop

Are lookups more expensive in Python 3.4 compared to 2.7 and can you explain why?

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • 1
    I can't reproduce similar results on my machine. If anything, the Python 3 `.items()` should be slightly faster as it doesn't build an intermediate list. – Simeon Visser Dec 16 '14 at 19:00
  • Note that the types are significantly different: for Python 2 you've got a dictionary mapping bytestrings to fixed-width ints, while for Python 3 you're mapping Unicode strings to arbitrary-precision ints. It could be that hashing a Unicode string is slower than hashing a bytestring, or that comparisons between long integers are slower than comparisons between fixed-width integers. I couldn't say for sure that that's where the timing difference comes from, though. – Mark Dickinson Dec 16 '14 at 19:15
  • @SimeonVisser honest question: you can't reproduce because the test is flawed, because you don't have both versions or because you got different results? :-) – Paulo Scardine Dec 17 '14 at 00:22
  • 1
    Different numbers that don't suggest 3 is slower than 2 here. – Simeon Visser Dec 17 '14 at 00:27

1 Answers1

5

It's not that lookups are more expensive1 in Python 3.4 than 2.7, but that dict.items() is cheaper. That's because dict.items() is a rather different beast in the two versions of the language:

$ python2.7
Python 2.7.9 (default, Dec 11 2014, 03:19:35) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> type({}.items())
<type 'list'>
>>> 
$ python3.4
Python 3.4.2 (default, Oct  8 2014, 08:07:42) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> type({}.items())
<class 'dict_items'>
>>> 

In Python 2, dict.items() constructs and returns a list, whereas in Python 3 it returns a dictionary view, and it turns out that iterating over this dynamic view into the dictionary is faster than constructing a list and then iterating over that.

Although dict.items() doesn't return a dictionary view in 2.7, it is possible to get one with dict.viewitems(), with similar performance benefits. Repeating your test, this time with viewitems() included, I get:

$ python2.7 -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k in d if d[k] == 1]'
1000000 loops, best of 3: 3.38 usec per loop
$ python2.7 -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k, v in d.items() if v == 1]'
1000000 loops, best of 3: 4.33 usec per loop
$ python2.7 -m timeit -n 1000000 \
  -s 'd={k:v for v, k in enumerate("abcdefghijhklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")}' \
  '[k for k, v in d.viewitems() if v == 1]'
1000000 loops, best of 3: 3.27 usec per loop

As regards the discussion that prompted your investigation, I'd say that the d.items() or d.viewitems() approach is more explicit, and therefore more pythonic, but that's really more an aesthetic choice than anything else.


1 Except inasmuch as Python 3.x is, as a general rule, slower than 2.x, but that's the price of progress ...

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160