3

I've refactored how the merged-dictionary (all_classes) below is created, but I'm wondering if it can be more efficient.

I have a dictionary of dictionaries, like this:

groups_and_classes = {'group_1': {'class_A': [1, 2, 3],
                                  'class_B': [1, 3, 5, 7], 
                                  'class_c': [1, 2], # ...many more items like this
                                 },
                      'group_2': {'class_A': [11, 12, 13],
                                  'class_C': [5, 6, 7, 8, 9]
                                 }, # ...and many more items like this
                     }

A function creates a new object from groups_and_classes like this (the function to create this is called often):

all_classes = {'class_A': [1, 2, 3, 11, 12, 13],
               'class_B': [1, 3, 5, 7, 9],
               'class_C': [1, 2, 5, 6, 7, 8, 9]
              }

Right now, there is a loop that does this:

all_classes = {}
for group in groups_and_classes.values():
    for c, vals in group.iteritems():
        for v in vals:
            if all_classes.has_key(c):
                if v not in all_classes[c]:
                    all_classes[c].append(v)
            else:
                all_classes[c] = [v]

So far, I changed the code to use a set instead of a list since the order of the list doesn't matter and the values need to be unique:

all_classes = {}
for group in groups_and_classes.values():
    for c, vals in group.iteritems():
        try:
            all_classes[c].update(set(vals))
        except KeyError:
            all_classes[c] = set(vals)

This is a little nicer, and I didn't have to convert the sets to lists because of how all_classes is used in the code.

Question: Is there a more efficient way of creating all_classes (aside from building it at the same time groups_and_classes is built, and changing everywhere this function is called)?

Jason Coon
  • 17,601
  • 10
  • 42
  • 50
  • Is that the biggest bottleneck of you program that it needs optimizing? Those micro-optimizations are rarely needed. If, however, you do - the fastest way would be to code it in C. – Emil Ivanov Mar 12 '10 at 13:38

3 Answers3

4

Here's a tweak for conciseness, though I'm not sure about performance:

from collections import defaultdict
all_classes = defaultdict(set)
for group in groups_and_classes.values():
    for c, vals in group.iteritems():
        all_classes[c].update(set(vals))

Defaultdicts are not quite the greatest thing since sliced bread, but they're pretty cool. :)

Vicki Laidler
  • 3,415
  • 1
  • 20
  • 17
  • The problem with defaultdicts IMO is that whenever you reference a key that isn't in it, that key is added to the dict even if you were just getting the value associated with that key (empty set in this case) rather than setting the key to a value. – Justin Peel Mar 12 '10 at 15:14
  • Combining @Brian's suggestion below with this solution, it is the fastest out of the 3 (mine and The Machine Charmer being the other 2) – Jason Coon Mar 12 '10 at 15:23
  • @justin - good point. In my use case, I convert the defaultdict to a dict, so that it won't cause problems. – Jason Coon Mar 12 '10 at 15:25
2

One thing that might improve things slightly is to avoid the redundant conversion to a set, and just use:

all_classes[c].update(vals)

update can actually take an arbitrary iterable, as it essentially just iterates and adds, so you can avoid an extra conversion step.

Brian
  • 116,865
  • 28
  • 107
  • 112
2

Combining Dictionaries Of Lists In Python.

def merge_dols(dol1, dol2):
    result = dict(dol1, **dol2)
    result.update((k, dol1[k] + dol2[k]) for k in set(dol1).intersection(dol2))
    return result

g1 = groups_and_classes['group_1']
g2 = groups_and_classes['group_2']

all_classes = merge_dols(g1,g2)

OR

all_classes = reduce(merge_dols,groups_and_classes.values())

--copied from Alex Martelli

If you get more than two groups then you can use itertools.reduce

all_classes = reduce(merge_dols,groups_and_classes.values())
Community
  • 1
  • 1
Pratik Deoghare
  • 35,497
  • 30
  • 100
  • 146