1

I would like to know if there is a more efficient way to append a dict value if the key exists, or create one if not. For the moment, I use a "if key in set(dict.keys())"

I read many topics that talk about collections.defaultdictbut is it efficient ? I mean, when you use a collections.defaultdict, does python make a "if key in ..." or does it work differently ?

My problem is that my dict is getting bigger and bigger, so my if key in set(dict.keys()) is getting longer to execute each time

Here is an example of what I talk about :

# a_list is a result of a loop that can iterate more than 10, 100, 1000...times
a_list = [[url1, sessions, transactions], [url2, sessions, transactions]...]
mydict = {}
for i in a_list:
    # if my key doesn't exist
    if i[0] not in set(mydict.keys()):
        mydict[i[0]] = {}
        mydict[i[0]]['session'] = i[1]
        mydict[i[0]]['transactions'] = i[2]

    else:
    # if my key exists
        mydict[i0]['sessions'] += i[1]
        mydict[i0]['transactions'] += i[2]

To be more precise, this script is made to deal with Google Analytics API, to avoid Sampling (so I make requests for each day of a month, so there is big chances that my urls (mydict keys) are the same for each day I request.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Quentin
  • 31
  • 1
  • 4
  • If you are aware of `if key in set(dict.keys())` and you have big chances that keys will be the same for each request you can try to "cache" the key you are sure is exists (cache only last key or cache few last keys), and if key is cached do not check for it in the big dict, it may increase performance a little if mydict is really big. – mucka Apr 13 '17 at 08:17
  • 3
    Yes, `defaultdict` will be much faster. Also, `if i[0] not in set(mydict.keys())` is *terribly* wasteful. A `dict` is already a hashmap, membership testing is a constant time operation, checking if a key does not exist should be done simply by `if i[0] not in mydict`. – juanpa.arrivillaga Apr 13 '17 at 08:17
  • 2
    *"I use a "if key in set(dict.keys())"* - why? `if key in dict` is `O(1)`, if you build a set every time that's `O(n)`! – jonrsharpe Apr 13 '17 at 08:17
  • you can use `mydict.has_key` instead of creating the key set on each iteration. – thedude Apr 13 '17 at 08:17
  • 3
    @thedude it is more idiomatic to use `key in my_dict` and `my_dict.has_key` isn't even in Python 3 anymore, so it's best not to use it. – juanpa.arrivillaga Apr 13 '17 at 08:18
  • See [this](http://stackoverflow.com/questions/12555967/is-the-defaultdict-in-pythons-collections-module-really-faster-than-using-setde) question. – juanpa.arrivillaga Apr 13 '17 at 08:20
  • Long story short: yes, a `defaultdict` is going to be much more efficient than what you're currently doing. The long story is https://www.youtube.com/watch?v=C4Kc8xzcA68 – jonrsharpe Apr 13 '17 at 08:21
  • @juanpa.arrivillaga I'm a python beginner, I do not understand all of the tricks, so thank you for your franc and helpful comment ;-). jonrsharpe thank u too, I'll definitely check this video. – Quentin Apr 13 '17 at 08:26
  • A dict will never have duplicate keys. So there is not point in doing set(my_dict.keys()). All the items return by keys() method will be unique. Also you can just check `if key in my_dict` – theBuzzyCoder Apr 13 '17 at 08:58

2 Answers2

2

This is how you can use python dictionaries if you want to use list or any other datatype when you don't know if key exists in the dictionary.

In [26]: for i in a_list:
...:     my_dict.setdefault(i[0], {'session':0, 'transaction':0})
...:     my_dict[i[0]]['session'] += i[1]
...:     my_dict[i[0]]['transaction'] += i[2]
...:     

setdefault method will only set default values if key is not found in dict. Otherwise it won't set it.

One more way to do it

In [44]: for i in a_list:
    ...:     my_dict[i[0]] = dict()
    ...:     my_dict[i[0]]['session'] = my_dict[i[0]].setdefault('session', 0) + i[1]
    ...:     my_dict[i[0]]['transaction'] = my_dict[i[0]].setdefault('transaction', 0) + i[2]
    ...:  

You don't have to check if key exists in the dict here.

theBuzzyCoder
  • 2,652
  • 2
  • 31
  • 26
0

This approach beats the .setdefault()-approaches in terms of speed.

mydict = {}
for i in a_list:
    if i[0] not in mydict:
        mydict[i[0]] = {'session': 0, 'transactions': 0}
    mydict[i[0]]['session'] += i[1]
    mydict[i[0]]['transactions'] += i[2]

benchmarked on my MacBookAir in iPython (Python2.7.13 as well as Python3.6.0) with a sample list of 1000000 items

RandomDude
  • 1,101
  • 18
  • 33