63

I'm trying to get a list of all keys in a list of dictionaries in order to fill out the fieldnames argument for csv.DictWriter.

previously, I had something like this:

[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

and I was using fieldnames = list[0].keys() to take the first dictionary in the list and extract its keys.

Now I have something like this where one of the dictionaries has more key:value pairs than the others (could be any of the results). The new keys are added dynamically based on information coming from an API so they may or may not occur in each dictionary and I don't know in advance how many new keys there will be.

[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5, "height":4},
{"name": "Pam", "age": 7}
]

I can't just use fieldnames = list[1].keys() since it isn't necessarily the second element that will have extra keys.

A simple solution would be to find the dictionary with the greatest number of keys and use it for the fieldnames, but that won't work if you have an example like this:

[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5, "height":4},
{"name": "Pam", "age": 7, "weight":90}
]

where both the second and third dictionary have 3 keys but the end result should really be the list ["name", "age", "height", "weight"]

o.h
  • 1,202
  • 1
  • 14
  • 24

7 Answers7

98
all_keys = set().union(*(d.keys() for d in mylist))

Edit: have to unpack the list. Now fixed.

Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
  • 1
    This solution works perfectly, but it seems to produce a list of keys that have a different order than the list of dictionaries they were extracted from. Any idea how to keep the indexing? Thank you! – Momchill Mar 18 '22 at 16:31
  • @Momchill order is not guaranteed because he is using a set. I will post a snippet below for you that uses a list. – mareoraft Jan 11 '23 at 17:46
33

Your data:

>>> LoD
[{'age': 10, 'name': 'Tom'}, 
 {'age': 5, 'name': 'Mark', 'height': 4}, 
 {'age': 7, 'name': 'Pam', 'weight': 90}]

This set comprehension will do it:

>>> {k for d in LoD for k in d.keys()}
{'age', 'name', 'weight', 'height'}

It works this way. First, create a list of lists of the dict keys:

>>> [list(d.keys()) for d in LoD]
[['age', 'name'], ['age', 'name', 'height'], ['age', 'name', 'weight']]

Then create a flattened version of this list of lists:

>>> [i for s in [d.keys() for d in LoD] for i in s]
['age', 'name', 'age', 'name', 'height', 'age', 'name', 'weight']

And create a set to eliminate duplicates:

>>> set([i for s in [d.keys() for d in LoD] for i in s])
{'age', 'name', 'weight', 'height'}

Which can be simplified to:

{k for d in LoD for k in d.keys()}
dawg
  • 98,345
  • 23
  • 131
  • 206
5
from itertools import chain

lis = [
    {"name": "Tom", "age": 10},
    {"name": "Mark", "age": 5, "height":4},
    {"name": "Pam", "age": 7, "weight":90}
]

# without qualification a dict iterates over its keys
# and set takes any iterable in its constructor
headers_as_set = set(chain.from_iterable(lis))

# you asked for a list
headers = list(
    set(chain.from_iterable(lis))
)
bwv549
  • 5,243
  • 2
  • 23
  • 20
4
>>> lis=[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5, "height":4},
{"name": "Pam", "age": 7, "weight":90}
]
>>> {z for y in (x.keys() for x in lis) for z in y}
set(['age', 'name', 'weight', 'height'])
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
3

Borrowing lis from @AshwiniChaudhary's answer, here is an explanation of how you could solve your problem.

>>> lis=[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5, "height":4},
{"name": "Pam", "age": 7, "weight":90}
]

Iterating directly over a dict returns its keys, so you don't have to call keys() to get them back, saving a function call and a list construction per element in your list.

>>> {k for d in lis for k in d}
set(['age', 'name', 'weight', 'height'])

or use itertools.chain:

>>> from itertools import chain
>>> {k for k in chain(*lis)}
set(['age', 'name', 'weight', 'height'])
octopusgrabbus
  • 10,555
  • 15
  • 68
  • 131
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
2

The following example will extract the keys:

set_ = set()
for dict_ in dictionaries:
    set_.update(dict_.keys())
print set_
octopusgrabbus
  • 10,555
  • 15
  • 68
  • 131
user1277476
  • 2,871
  • 12
  • 10
0

If order matters to you, read on...

Input your data:

>>> list_of_dicts = [{'age': 10, 'name': 'Tom'},{'age': 5, 'name': 'Mark', 'height': 4}, {'age': 7, 'name': 'Pam', 'weight': 90}]

Define your function:

>>> def get_all_keys_in_order(list_of_dicts):
        ordered_keys = []
        for dict_ in list_of_dicts:
            for key in dict_:
                if key not in ordered_keys:
                    ordered_keys.append(key)
        return ordered_keys

Run your function to get output:

>>> get_all_keys_in_order(list_of_dicts)
['age', 'name', 'height', 'weight']
mareoraft
  • 3,474
  • 4
  • 26
  • 62
  • @Momchill I think this solves your problem. Please note that this algorithm is slower than the set solution which could be a problem if you are working with big data. But for small data there is no problem. – mareoraft Jan 11 '23 at 17:55