0

Consider the following:

mylist = ['a','b','c','d']
mydict = {}
for val in mylist:
    if val not in mydict:
        mydict[val] = []
    mydict[val].append(1)

Is there any way to avoid the double lookup ("val in mydict" and "mydict[val]")?

Note - I've tried using a default return value (i.e. mydict.get(val, []).append(1)), but the new list isn't actually registered in the dictionary as the value for the key. For example:

mydict = {}
mydict.get('not_in_dict','default')
mydict

returns {}. In my case I want something that would return {'not_in_dict' : 'default'}.

Or is the correct answer here that I should't worry about double lookups? (e.g. "chill, dude, don't optimize if you don't have to" or "python is awesome, it takes care of this already").

I'm using python 3.4.

user3666197
  • 1
  • 6
  • 50
  • 92
Gordon Bean
  • 4,272
  • 1
  • 32
  • 47
  • @Ashwini - thank you for pointing out the duplicate question. The answer there was just what I was looking for. – Gordon Bean Sep 30 '14 at 22:36

2 Answers2

2

You can use dict.setdefault

mylist = ['a','b','c','d']
mydict = {}
for val in mylist:
    mydict.setdefault(val,[])
    mydict[val].append(1)

Or use collections.defaultdict which is more efficient

from collections import defaultdict
mylist = ['a','b','c','d']
mydict = defaultdict(list)
for val in mylist:
    mydict[val].append(1)


In [14]: mylist = ['a','b','c','d']    
In [15]: mydict = {}    
In [16]: %%timeit
   ....: for val in mylist:
   ....:     mydict.setdefault(val,[])
   ....:     mydict[val].append(1)
   ....: 
1000000 loops, best of 3: 1.51 µs per loop

In [18]: mydict = defaultdict(list)
In [19]: %%timeit
   ....: for val in mylist:
   ....:     mydict[val].append(1)
   ....: 
1000000 loops, best of 3: 603 ns per loop
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • It appears that your example is still doing two lookups: mydict.setdefault(val,[]) and then again with mydict[val].append(1). Does the time change when you chain the two commands: mydict.setdefault(val,[]).append(1)? – Gordon Bean Sep 30 '14 at 22:39
  • the most efficient solution would be `defaultdict`, your original code is not far behind, checking if a key exists is constant time so it is pretty cheap to check if the keys exists so your own code is pretty efficient. – Padraic Cunningham Sep 30 '14 at 22:43
  • I guess then the answer really is that it doesn't matter all that much. I'm sure there is a python quote somewhere out there about premature optimization being the root of all evil. :) – Gordon Bean Sep 30 '14 at 22:45
  • There sure is, defaultdict is a handy tool though so worth learning about. As far as optimization goes I always get my code completed and working correctly then worry about optimisation if needed. – Padraic Cunningham Sep 30 '14 at 22:50
0
from collections import defaultdict

mydict = defaultdict(list)
mydict['not in dict'] = 'foo'
m.wasowski
  • 6,329
  • 1
  • 23
  • 30