Union of all keys from a list of dictionaries

Question

Say I have a list of dictionaries. They mostly have the same keys in each row, but a few don't match and have extra key/value pairs. Is there a fast way to get a set of all the keys in all the rows?

Right now I'm using this loop:

def get_all_keys(dictlist):
    keys = set()
    for row in dictlist:
        keys = keys.union(row.keys())

It just seems terribly inefficient to do this on a list with hundreds of thousands of rows, but I'm not sure how to do it better

Thanks!

`set([row.keys() for row in dictlist])` is *not* what you wanted. Besides, it results in a `TypeError`. — Elazar, Jun 03 '13 at 17:55

score 11 · Accepted Answer · answered Jun 03 '13 at 17:39

11

You could try:

def all_keys(dictlist):
    return set().union(*dictlist)

Avoids imports, and will make the most of the underlying implementation of set. Will also work with anything iterable.

answered Jun 03 '13 at 17:39

Jon Clements

138,671
33
247
280

Thanks! That works, but I'm not sure why. Can you help me understand what the asterisk is doing in this case? How is it that it only extracts the keys from `dictlist`? – Joe Pinsonault Jun 03 '13 at 18:04
3

Sure... The `*` unpacks the list into separate arguments to [set.union](http://docs.python.org/2/library/stdtypes.html#set.union) which can take any number of iterable arguments... (so the above call is effectively set().union(first_dict, second_dict, third_dict, fourth_dict...) So for each object in the list, it attempts to iterate over it (which in the case of a `dict` is its keys, or a list/tuple its items, or for a string its characters.... etc...) – Jon Clements Jun 03 '13 at 18:10
Ahh, thank you. This helps me understand what the asterisk is useful for too. – Joe Pinsonault Jun 03 '13 at 18:11

mgilson · Answer 2 · 2013-06-03T18:15:09.883

4

A fun one which works on python3.x¹ relies on reduce and the fact the dict.keys() now returns a set-like object:

>>> from functools import reduce
>>> dicts = [{1:2},{3:4},{5:6}]
>>> reduce(lambda x,y:x | y.keys(),dicts,{})
{1, 3, 5}

For what it's worth,

>>> reduce(lambda x,y:x | y.keys(),dicts,set())
{1, 3, 5}

works too, or, if you want to avoid a lambda (and the initializer), you could even do:

>>> reduce(operator.or_, (d.keys() for d in dicts))

Very neat.

This really shines most when you only have two elements. Then, instead of doing something like set(a) | set(b), you can do a.keys() | b.keys() which seems a little nicer to me.

^{¹It can be made to work on python2.7 as well. Use dict.viewkeys instead of dict.keys}

edited Jun 03 '13 at 18:15

answered Jun 03 '13 at 17:52

mgilson

300,191
65
633
696

1

Not convinced you need method calls for either... `reduce(set.union, dicts, set())` should just work I believe... – Jon Clements Jun 03 '13 at 17:54
1

@JonClements -- True. My thought here was more to demonstrate the set-like nature of `dict.keys` in python3.x – mgilson Jun 03 '13 at 17:56
Umm, okay - `reduce(operator.or_, (d.keys() for d in dicts))` ? – Jon Clements Jun 03 '13 at 17:59
@JonClements -- Yeah, I like that one. I'll update using that. – mgilson Jun 03 '13 at 18:13

Elazar · Answer 3 · 2013-06-03T17:53:26.273

3

you can do:

from itertools import chain
return set(chain.from_iterable(dictlist))

As @Jon Clements noted, this can keep only the required data in memory, in contrast to using the * operator for either chain or union.

edited Jun 03 '13 at 17:53

answered Jun 03 '13 at 17:35

Elazar

20,415
4
46
67

What does `chain` do here? – Blender Jun 03 '13 at 17:35
1

That won't work -- it would try to make a set from each dictionary. You'd need `set(chain.from_iterable(dictlist))` or something. – DSM Jun 03 '13 at 17:36
2

I'd go with @DSM here - definitely `chain.from_iterable` - it's basically `chain(*dictlist)` but more optimised (and a bit more explicit IMHO)... – Jon Clements Jun 03 '13 at 17:42
@JonClements, I don't see an advantage of `chain.from_iterable` here. Since the dicts and dictlist all exist already, there is no saving. – John La Rooy Jun 03 '13 at 17:58
@gnibbler dictlist may be any iterable. Not only a list. – Elazar Jun 03 '13 at 18:00
@Blender can you explain your question? – Elazar Jun 03 '13 at 18:04
@Elazar, that also works with tuple unpacking of *args. The key difference is just when the iterator providing the dicts is consumed. – John La Rooy Jun 03 '13 at 18:04
@Elazar: `chain(dictlist) == dictlist`, so `chain` wasn't doing anything. – Blender Jun 03 '13 at 18:04
Here's one related post: http://stackoverflow.com/questions/15004772/what-is-the-difference-between-chain-and-chain-from-iterable-in-itertools – Jon Clements Jun 03 '13 at 18:13

martineau · Answer 4 · 2013-06-03T17:58:20.030

1

setsare like dictionaries, and have an update() method, so this would work in your loop:

keys.update(row.iterkeys())

edited Jun 03 '13 at 17:58

answered Jun 03 '13 at 17:34

martineau

119,623
25
170
301

score 0 · Answer 5 · answered Jun 03 '13 at 17:37

0

If you worry about performance, you should quit the dict.keys() method, since it creates a list in memory. And you can use set.update() instead of union, but I don't know if it is faster than set.union().

answered Jun 03 '13 at 17:37

lenz

5,658
5
24
44

Union of all keys from a list of dictionaries

5 Answers5