2

Say I have a list of dictionaries. They mostly have the same keys in each row, but a few don't match and have extra key/value pairs. Is there a fast way to get a set of all the keys in all the rows?

Right now I'm using this loop:

def get_all_keys(dictlist):
    keys = set()
    for row in dictlist:
        keys = keys.union(row.keys())

It just seems terribly inefficient to do this on a list with hundreds of thousands of rows, but I'm not sure how to do it better

Thanks!

Joe Pinsonault
  • 537
  • 7
  • 16

5 Answers5

11

You could try:

def all_keys(dictlist):
    return set().union(*dictlist)

Avoids imports, and will make the most of the underlying implementation of set. Will also work with anything iterable.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • Thanks! That works, but I'm not sure why. Can you help me understand what the asterisk is doing in this case? How is it that it only extracts the keys from `dictlist`? – Joe Pinsonault Jun 03 '13 at 18:04
  • 3
    Sure... The `*` unpacks the list into separate arguments to [set.union](http://docs.python.org/2/library/stdtypes.html#set.union) which can take any number of iterable arguments... (so the above call is effectively set().union(first_dict, second_dict, third_dict, fourth_dict...) So for each object in the list, it attempts to iterate over it (which in the case of a `dict` is its keys, or a list/tuple its items, or for a string its characters.... etc...) – Jon Clements Jun 03 '13 at 18:10
  • Ahh, thank you. This helps me understand what the asterisk is useful for too. – Joe Pinsonault Jun 03 '13 at 18:11
4

A fun one which works on python3.x1 relies on reduce and the fact the dict.keys() now returns a set-like object:

>>> from functools import reduce
>>> dicts = [{1:2},{3:4},{5:6}]
>>> reduce(lambda x,y:x | y.keys(),dicts,{})
{1, 3, 5}

For what it's worth,

>>> reduce(lambda x,y:x | y.keys(),dicts,set())
{1, 3, 5}

works too, or, if you want to avoid a lambda (and the initializer), you could even do:

>>> reduce(operator.or_, (d.keys() for d in dicts))

Very neat.

This really shines most when you only have two elements. Then, instead of doing something like set(a) | set(b), you can do a.keys() | b.keys() which seems a little nicer to me.


1It can be made to work on python2.7 as well. Use dict.viewkeys instead of dict.keys

mgilson
  • 300,191
  • 65
  • 633
  • 696
3

you can do:

from itertools import chain
return set(chain.from_iterable(dictlist))

As @Jon Clements noted, this can keep only the required data in memory, in contrast to using the * operator for either chain or union.

Elazar
  • 20,415
  • 4
  • 46
  • 67
1

setsare like dictionaries, and have an update() method, so this would work in your loop:

keys.update(row.iterkeys())
martineau
  • 119,623
  • 25
  • 170
  • 301
0

If you worry about performance, you should quit the dict.keys() method, since it creates a list in memory. And you can use set.update() instead of union, but I don't know if it is faster than set.union().

lenz
  • 5,658
  • 5
  • 24
  • 44