9

I have a list of dictionaries that I would like to combine into one dictionary and add the values from each dictionary in the list. For example:

ds = [{1: 1, 2: 0, 3: 0}, {1: 2, 2: 1, 3: 0}, {1: 3, 2: 2, 3: 1, 4: 5}]

The final results should be a single dictionary:

merged = {1: 6, 2: 3, 3: 1, 4: 5}

I'm interested in performance and am looking for the fastest implementation that can merge a list of n-dictionaries into one dictionary and sum the values. An obvious implementation is:

from collections import defaultdict

merged = defaultdict(int)

for d in ds:
    for k, v in d.items():
        merged[k] += v

Is there a faster way to do this in Python 2.6?

John Y
  • 14,123
  • 2
  • 48
  • 72
user1728853
  • 2,607
  • 4
  • 30
  • 31

1 Answers1

6

defaultdict is still fastest, I found a few ways to speed it up by caching function names and now just found another way that sped it up significantly, by just iterating for k in d instead of using d.items() or d.iteritems()

Some timings so far:

from random import randrange
ds = [dict((randrange(1, 1000), randrange(1, 1000)) for i in xrange(500))
      for i in xrange(10000)]

# 10000 dictionaries of approx. length 500

from collections import defaultdict

def merge1(dicts, defaultdict=defaultdict, int=int):
    merged = defaultdict(int)
    for d in dicts:
        for k in d:
            merged[k] += d[k]
    return merged

def merge2(dicts):
    merged = {}
    merged_get = merged.get
    for d in dicts:
        for k in d:
            merged[k] = merged_get(k, 0) + d[k]
    return merged


def merge3(dicts):
    merged = {}
    for d in dicts:
        for k in d:
            merged[k] = merged[k] + d[k] if k in merged else 0
    return merged


from timeit import timeit
for func in ('merge1', 'merge2', 'merge3'):
    print func, timeit(stmt='{0}(ds)'.format(func),
                       setup='from __main__ import merge1, merge2, merge3, ds',
                       number=1)

merge1 0.992541510164
merge2 1.40478747997
merge3 1.23502204889
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • 2
    Are you sure that's actually faster, though? (I ask because in the past, I've found summing Counter objects surprisingly slow.) – DSM Apr 23 '13 at 12:49
  • @DSM it doesn't matter it's a duplicate though, I think timings are in the other thread – jamylak Apr 23 '13 at 12:52
  • I should have mentioned, I am on Python 2.6 – user1728853 Apr 23 '13 at 12:52
  • @user1728853 Yeah you probably need to try both solutions with those optimizations, anyway that other thread already has the timings – jamylak Apr 23 '13 at 12:54