33

List of dictionaries:

data = [{
         'a':{'l':'Apple',
                'b':'Milk',
                'd':'Meatball'},
         'b':{'favourite':'coke',
              'dislike':'juice'}
         },
         {
         'a':{'l':'Apple1',
                'b':'Milk1',
                'd':'Meatball2'},
         'b':{'favourite':'coke2',
              'dislike':'juice3'}
         }, ...
]

I need to join all nested dictionaries to reach at the expected output:

 [{'d': 'Meatball', 'b': 'Milk', 'l': 'Apple', 'dislike': 'juice', 'favourite': 'coke'},
  {'d': 'Meatball2', 'b': 'Milk1', 'l': 'Apple1', 'dislike': 'juice3', 'favourite': 'coke2'}]

I try nested list comprehension, but cannot join dict together:

L = [y for x in data for y in x.values()]
print (L)

[{'d': 'Meatball', 'b': 'Milk', 'l': 'Apple'}, 
 {'dislike': 'juice', 'favourite': 'coke'}, 
{'d': 'Meatball2', 'b': 'Milk1', 'l': 'Apple1'}, 
 {'dislike': 'juice3', 'favourite': 'coke2'}]

I am looking for the fastest solution.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • [how-to-merge-two-dictionaries-in-a-single-expression](https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression) would be helpful. – pe-perry Feb 09 '18 at 07:43

4 Answers4

25

You can do the following, using itertools.chain:

>>> from itertools import chain
# timeit: ~3.40
>>> [dict(chain(*map(dict.items, d.values()))) for d in data]
[{'l': 'Apple', 
  'b': 'Milk', 
  'd': 'Meatball', 
  'favourite': 'coke', 
  'dislike': 'juice'}, 
 {'l': 'Apple1', 
  'b': 'Milk1', 
  'dislike': 'juice3', 
  'favourite': 'coke2', 
  'd': 'Meatball2'}]

The usage of chain, map, * make this expression a shorthand for the following doubly nested comprehension which actually performs better on my system (Python 3.5.2) and isn't that much longer:

# timeit: ~2.04
[{k: v for x in d.values() for k, v in x.items()} for d in data]
# Or, not using items, but lookup by key
# timeit: ~1.67
[{k: x[k] for x in d.values() for k in x} for d in data]

Note:

RoadRunner's loop-and-update approach outperforms both these one-liners at timeit: ~1.37

user2390182
  • 72,016
  • 6
  • 67
  • 89
  • 2
    I like your fair behaviour because add timing of another solution which is better, so no reaccepting ;) Thank you. – jezrael Feb 09 '18 at 08:09
  • 2
    @jezrael Thx, nah, no need to hide that fact. It is interesting to compare these 3 stylistically so different approaches and to see that the straightforward loop beats the comprehension and particularly the kitchen sink of built-ins and itertools :) – user2390182 Feb 09 '18 at 08:14
  • 2
    A more readable alternative to `dict(chain(*map(...` is `[ChainMap(*d.values()) for d in data]`. It's slower than the other methods, though. – Eric Duminil Feb 09 '18 at 11:01
  • Why if I want to flatten the output of `[{k: v for x in d.values() for k, v in x.items()} for d in data]` which is a list of dictionary it give error `TypeError: descriptor 'items' requires a 'dict' object but received a 'str'`? This result is like the input data. – abdoulsn Dec 06 '19 at 09:56
23

You can do this with 2 nested loops, and dict.update() to add inner dictionaries to a temporary dictionary and add it at the end:

L = []
for d in data:
    temp = {}
    for key in d:
        temp.update(d[key])

    L.append(temp)

# timeit ~1.4
print(L)

Which Outputs:

[{'l': 'Apple', 'b': 'Milk', 'd': 'Meatball', 'favourite': 'coke', 'dislike': 'juice'}, {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2', 'favourite': 'coke2', 'dislike': 'juice3'}]
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
7

You can use functools.reduce along with a simple list comprehension to flatten out the list the of dicts

>>> from functools import reduce 

>>> data = [{'b': {'dislike': 'juice', 'favourite': 'coke'}, 'a': {'l': 'Apple', 'b': 'Milk', 'd': 'Meatball'}}, {'b': {'dislike': 'juice3', 'favourite': 'coke2'}, 'a': {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2'}}]
>>> [reduce(lambda x,y: {**x,**y},d.values()) for d in data]
>>> [{'dislike': 'juice', 'l': 'Apple', 'd': 'Meatball', 'b': 'Milk', 'favourite': 'coke'}, {'dislike': 'juice3', 'l': 'Apple1', 'd': 'Meatball2', 'b': 'Milk1', 'favourite': 'coke2'}]

Time benchmark is as follows:

>>> import timeit
>>> setup = """
      from functools import reduce
      data = [{'b': {'dislike': 'juice', 'favourite': 'coke'}, 'a': {'l': 'Apple', 'b': 'Milk', 'd': 'Meatball'}}, {'b': {'dislike': 'juice3', 'favourite': 'coke2'}, 'a': {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2'}}]
  """
>>> min(timeit.Timer("[reduce(lambda x,y: {**x,**y},d.values()) for d in data]",setup=setup).repeat(3,1000000))
>>> 1.525032774952706

Time benchmark of other answers on my machine

>>> setup = """
        data = [{'b': {'dislike': 'juice', 'favourite': 'coke'}, 'a': {'l': 'Apple', 'b': 'Milk', 'd': 'Meatball'}}, {'b': {'dislike': 'juice3', 'favourite': 'coke2'}, 'a': {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2'}}]
    """
>>> min(timeit.Timer("[{k: v for x in d.values() for k, v in x.items()} for d in data]",setup=setup).repeat(3,1000000))
>>> 2.2488374650129117

>>> min(timeit.Timer("[{k: x[k] for x in d.values() for k in x} for d in data]",setup=setup).repeat(3,1000000))
>>> 1.8990078769857064

>>> code = """
      L = []
      for d in data:
          temp = {}
          for key in d:
              temp.update(d[key])

          L.append(temp)
    """

>>> min(timeit.Timer(code,setup=setup).repeat(3,1000000))
>>> 1.4258553800173104

>>> setup = """
      from itertools import chain
      data = [{'b': {'dislike': 'juice', 'favourite': 'coke'}, 'a': {'l': 'Apple', 'b': 'Milk', 'd': 'Meatball'}}, {'b': {'dislike': 'juice3', 'favourite': 'coke2'}, 'a': {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2'}}]
    """
>>> min(timeit.Timer("[dict(chain(*map(dict.items, d.values()))) for d in data]",setup=setup).repeat(3,1000000))
>>> 3.774383604992181
Sohaib Farooqi
  • 5,457
  • 4
  • 31
  • 43
4

If you have nested dictionaries with only 'a' and 'b' keys, then I suggest the following solution I find fast and very easy to understand (for readability purpose):

L = [x['a'] for x in data]
b = [x['b'] for x in data]

for i in range(len(L)):
    L[i].update(b[i])

# timeit ~1.4

print(L)
Laurent H.
  • 6,316
  • 1
  • 18
  • 40