23

I am using Python 2.7.5 @ Mac OS X 10.9.3 with 8GB memory and 1.7GHz Core i5. I have tested time consumption as below.

d = {i:i*2 for i in xrange(10**7*3)} #WARNING: it takes time and consumes a lot of RAM

%time for k in d: k,d[k]
CPU times: user 6.22 s, sys: 10.1 ms, total: 6.23 s
Wall time: 6.23 s

%time for k,v in d.iteritems(): k, v
CPU times: user 7.67 s, sys: 27.1 ms, total: 7.7 s
Wall time: 7.69 s

It seems iteritems is slower. I am wondering what is the advantage of iteritems over directly accessing the dict.

Update: for a more accuracy time profile

In [23]: %timeit -n 5 for k in d: v=d[k]
5 loops, best of 3: 2.32 s per loop

In [24]: %timeit -n 5 for k,v in d.iteritems(): v
5 loops, best of 3: 2.33 s per loop
Noumenon
  • 5,099
  • 4
  • 53
  • 73
czheo
  • 1,771
  • 2
  • 12
  • 22
  • 11
    That... was not what I expected to see – mhlester Jun 12 '14 at 16:28
  • This [answer](http://stackoverflow.com/a/10458567/2297365) might be good to read over. – huu Jun 12 '14 at 16:29
  • 1
    @HuuNguyen It's not really relevant. He isn't asking about `items()` vs `iteritems()`, but about a simple loop + key lookup and `iteritems()`. (I also didn't get this immediately, since the `items()` vs `iteritems()` comes up quite often...) – Bakuriu Jun 12 '14 at 16:31
  • I would add a python version to it. 2 and 3 differ quite much in this regard AFAIK. – luk32 Jun 12 '14 at 16:33
  • @Bakuriu Ah, I see the distinction. Yeah, I initially thought this was a simple, and expected, tradeoff of CPU vs memory. – huu Jun 12 '14 at 16:33
  • I cannot reproduce the timings. On my machine `iteritems()` is *faster*. Also you'd have to consider that for a fair comparison you should use: `for k,v in d.iteritems(): pass` vs `for k in d: v = d[k]`, since you want to compare the speed of iterations that provide the same information. – Bakuriu Jun 12 '14 at 16:43
  • @Bakuriu `for k,v in d.iteritems(): v`(Wall time: 5.1 s) is slower than `for k in d: d[k]`(Wall time: 4.44 s) on my machine – czheo Jun 12 '14 at 16:50
  • @czheo, you're not doing an assignment in the second loop. Do `v = d[k]` instead of just `d[k]`. – huu Jun 12 '14 at 16:55
  • 1
    I get different results when timing with %timeit vs %time. What is the most accurate? – M4rtini Jun 12 '14 at 16:58
  • @Bakuriu With the assignment, `iteritems()` wins a little. But the advantage is so small that I still hope someone could give me a more persuasive reason of implementing it in the python core. – czheo Jun 12 '14 at 17:03
  • @M4rtini You may refer to [time vs timeit](http://stackoverflow.com/questions/17579357/time-time-vs-timeit-timeit) – czheo Jun 12 '14 at 17:08
  • @czheo that's right. With the updated timing code, i get that iteritems is almost 50% faster. 1.62s vs 1.09s – M4rtini Jun 12 '14 at 17:16
  • @czheo Note that `iteritems()` is gone in python3. And note that Python3's `items()` is *not* equivalent to `iteritems()` but to `viewitems()` which provides a lot of other functionalities that you don't have with your loop version (e.g. `x.items() & y.items()` returns the intersection of the items in common between `x` and `y`. – Bakuriu Jun 12 '14 at 17:30

5 Answers5

15

To answer your question we should first dig some information about how and when iteritems() was added to the API.

The iteritems() method was added in Python2.2 following the introduction of iterators and generators in the language (see also: What is the difference between dict.items() and dict.iteritems()?). In fact the method is explicitly mentioned in PEP 234. So it was introduced as a lazy alternative to the already present items().

This followed the same pattern as file.xreadlines() versus file.readlines() which was introduced in Python 2.1 (and already deprecated in python2.3 by the way).

In python 2.3 the itertools module was added which introduced lazy counterparts to map, filter etc.

In other words, at the time there was (and still there is) a strong trend towards lazyness of operations. One of the reasons is to improve memory efficiency. An other one is to avoid unneeded computation.

I cannot find any reference that says that it was introduced to improve the speed of looping over the dictionary. It was simply used to replace calls to items() that didn't actually have to return a list. Note that this include more use-cases than just a simple for loop.

For example in the code:

function(dictionary.iteritems())

you cannot simply use a for loop to replace iteritems() as in your example. You'd have to write a function (or use a genexp, even though they weren't available when iteritems() was introduced, and they wouldn't be DRY...).

Retrieving the items from a dict is done pretty often so it does make sense to provide a built-in method and, in fact, there was one: items(). The problem with items() is that:

  • it isn't lazy, meaning that calling it on a big dict can take quite some time
  • it takes a lot of memory. It can almost double the memory usage of a program if called on a very big dict that contains most objects being manipulated
  • Most of the time it is iterated only once

So, when introducing iterators and generators, it was obvious to just add a lazy counterpart. If you need a list of items because you want to index it or iterate more than once, use items(), otherwise you can just use iteritems() and avoid the problems cited above.

The advantages of using iteritems() are the same as using items() versus manually getting the value:

  • You write less code, which makes it more DRY and reduces the chances of errors
  • Code is more readable.

Plus the advantages of lazyness.


As I already stated I cannot reproduce your performance results. On my machine iteritems() is always faster than iterating + looking up by key. The difference is quite negligible anyway, and it's probably due to how the OS is handling caching and memory in general. In otherwords your argument about efficiency isn't a strong argument against (nor pro) using one or the other alternative.

Given equal performances on average, use the most readable and concise alternative: iteritems(). This discussion would be similar to asking "why use a foreach when you can just loop by index with the same performance?". The importance of foreach isn't in the fact that you iterate faster but that you avoid writing boiler-plate code and improve readability.


I'd like to point out that iteritems() was in fact removed in python3. This was part of the "cleanup" of this version. Python3 items() method id (mostly) equivalent to Python2's viewitems() method (actually a backport if I'm not mistaken...).

This version is lazy (and thus provides a replacement for iteritems()) and has also further functionality, such as providing "set-like" operations (such as finding common items between dicts in an efficient way etc.) So in python3 the reasons to use items() instead of manually retrieving the values are even more compelling.

Community
  • 1
  • 1
Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • Now with latest Python 3 subversions, `viewitems()` is the Py2 backport of Py3's `items()`, because both provide a live view, whereas `iteritems()` provides a snapshot at the time the method was called. – gaborous Jul 29 '17 at 11:16
14

Using for k,v in d.iteritems() with more descriptive names can make the code in the loop suite easier to read.

wwii
  • 23,232
  • 7
  • 37
  • 77
  • 1
    I initially downvoted this for not being an answer to the question. Reviewing the question though, the OP is asking "what is the advantage of iteritems over directly accessing the dict", so technically this is a valid answer. – huu Jun 12 '14 at 16:40
  • 2
    Given the timing posted by the OP, I asked myself - why do I use this idiom? This definitely was at the top of the list. – wwii Jun 12 '14 at 16:45
9

as opposed to using the system time command, running in ipython with timeit yields:

d = {i:i*2 for i in xrange(10**7*3)} #WARNING: it takes time and consumes a lot of RAM

timeit for k in d: k, d[k]
1 loops, best of 3: 2.46 s per loop

timeit for k, v in d.iteritems(): k, v
1 loops, best of 3: 1.92 s per loop

i ran this on windows, python 2.7.6. have you run it multiple times to confirm it wasn't something going on with the system itself?

acushner
  • 9,595
  • 1
  • 34
  • 34
6

I know technically this is not an answer to the question, but the comments section is a poor place to put this sort of information. I hope that this helps people better understand the nature of the problem being discussed.

For thoroughness I've timed a bunch of different configurations. These are all timed using timeit with a repetition factor of 10. This is using CPython version 2.7.6 on Mac OS X 10.9.3 with 16GB memory and 2.3GHz Core i7.

The original configuration

python -m timeit -n 10 -s 'd={i:i*2 for i in xrange(10**7*3)}' 'for k in d: k, d[k]'
>> 10 loops, best of 3: 2.05 sec per loop

python -m timeit -n 10 -s 'd={i:i*2 for i in xrange(10**7*3)}' 'for k, v in d.iteritems(): k, v'
>> 10 loops, best of 3: 1.74 sec per loop

Bakuriu's suggestion

This suggestion involves passing in the iteritems loop, and assigning a value to a variable v in the first loop by accessing the dictionary at k.

python -m timeit -n 10 -s 'd={i:i*2 for i in xrange(10**7*3)}' 'for k in d: v = d[k]'
>> 10 loops, best of 3: 1.29 sec per loop

python -m timeit -n 10 -s 'd={i:i*2 for i in xrange(10**7*3)}' 'for k, v in d.iteritems(): pass'
>> 10 loops, best of 3: 934 msec per loop

No assignment in the first

This one removes the assignment in the first loop but keeps the dictionary access. This is not a fair comparison because the second loop creates an additional variable and assigns it a value implicitly.

python -m timeit -n 10 -s 'd={i:i*2 for i in xrange(10**7*3)}' 'for k in d: d[k]'
>> 10 loops, best of 3: 1.27 sec per loop

Interestingly, the assignment is trivial to the access itself -- the difference being a mere 20 msec total. In every comparison (even the final, unfair one), the iteritems wins out.

The times are closest, percentage wise, in the original configuration. This is probably due to the bulk of the work being creating a tuple (which is not assigned anywhere). Once that is removed from the equation, the differences between the two methods becomes more pronounced.

huu
  • 7,032
  • 2
  • 34
  • 49
0

dict.iter() wins out heavily in python 3.5.

Here is a small performance stat:

d = {i:i*2 for i in range(10**3)}
timeit.timeit('for k in d: k,d[k]', globals=globals())
75.92739052970501
timeit.timeit('for k, v in d.items(): k,v', globals=globals())
57.31370617801076 
nesdis
  • 1,182
  • 13
  • 16