0

If there are n number of nested dictionaries of varying values and length of keys in a list:

original_list = 
[
    {'animal': {'mammal': {'herbivore': 'zebra'}}},
    {'animal': {'mammal': {'herbivore': 'deer'}}},
    {'animal': {'mammal': {'carnivore': 'tiger'}}},
    {'animal': {'mammal': {'herbivore': 'lion'}}},
    {'furniture': {'chair'}}
]

How to aggregate values with the same nested keys to obtain a result such as:

[
    {'animal': {'mammal': {'herbivore': 'zebra', 'deer'}}},
    {'animal': {'mammal': {'carnivore': 'tiger', 'lion'}}},
    {'furniture': 'chair'}
]

or a more condensed view such as:

[
    {'animal':
        {'mammal':
            {'herbivore': ['zebra', 'deer']},
            {'carnivore': ['tiger', 'lion']}
        }
    },
    {'furniture': ['chair']}
]

or

[
    {'animal': {'mammal': {'herbivore': ['zebra', 'deer']}}},
    {'animal': {'mammal': {'carnivore': ['tiger', 'lion']}}},
    {'furniture': ['chair']}
]

I have tried this:

from collections import defaultdict
d = defaultdict(list)
for item in original_list:
    for k, v in item.items():
        d[k].append(v)

But that just aggregates at the root of the list (and not at inner levels) like:

[
    {
        'animal': 
        [
            {'mammal': {'herbivore': 'zebra'}}},
            {'mammal': {'herbivore': 'deer'}}},
            {'mammal': {'carnivore': 'tiger'}}},
            {'mammal': {'herbivore': 'lion'}}}
        ],
    }
    {
        'furniture': {'chair'}
    }
]
crypticgamer
  • 101
  • 7
  • Have you tried this? https://stackoverflow.com/questions/5946236/how-to-merge-multiple-dicts-with-same-key-or-different-key – sagar1025 Apr 29 '21 at 01:11
  • 1
    The last item of the input is supposed to be `{'furniture': 'chair'}`, not `{'furniture': {'chair'}}`, right? Or do you actually mean to deal with a set? And the output of this item should be `{'furniture': ['chair]}}`, not `{'furniture': {'chair'}}`. – blhsing Apr 29 '21 at 01:14
  • It's supposed to be {'furniture': ['chair']} Also updated the post to (possibly) obtain obtain a better(more readable) output: `[{'animal': {'mammal': {'herbivore': ['zebra', 'deer']}, {'carnivore': ['tiger', 'lion']}}}, {'furniture': ['chair']}]` – crypticgamer Apr 29 '21 at 19:44

2 Answers2

1

Since the nested dicts from the input are of variable depths, you can't handle it with a fixed nested for loop. Instead, record the path of the keys to each record in a list first, and use a mapping dict to map the path to the list that the current value should append to (after converting the path from a list to a tuple to be hashable). If the path doesn't yet exist in the mapping dict, build a new record as a dict and use a for loop to initialize every level except the last of the key path as a dict by using a temporary node dict to point to the current level of sub-dict. Pop the last key from the path first so that the last level can be initialized as a list. Append the current value to the list, whether it is from the mapping dict or a newly built record:

output = []
mapping = {}
for record in original_list:
    path = []
    while True:
        key, value = next(iter(record.items()))
        path.append(key)
        if not isinstance(value, dict):
            break
        record = value
    signature = tuple(path)
    if signature not in mapping:
        node = {}
        output.append(node)
        last_key = path.pop()
        for key in path:
            node[key] = node = {}
        node[last_key] = mapping[signature] = []
    mapping[signature].append(value)

With your sample input (after correcting tiger as a carnivore and furniture as a string rather than a set of a string), output becomes:

[{'animal': {'mammal': {'herbivore': ['zebra', 'deer']}}},
 {'animal': {'mammal': {'carnivore': ['tiger', 'lion']}}},
 {'furniture': ['chair']}]

Demo: https://replit.com/@blhsing/HorizontalFluidObjects

blhsing
  • 91,368
  • 6
  • 71
  • 106
  • Thank you for explaining the solution. However, could this process be extended to achieve a higher level of clustering to remove any repetitions? For e.g. `[{'animal': {'mammal': {'herbivore': ['zebra', 'deer']}, {'carnivore': ['tiger', 'lion']}}}, {'furniture': ['chair']}]` – crypticgamer Apr 29 '21 at 18:37
  • The solution in [this post](https://stackoverflow.com/a/57117778) solves the issue when it isi applied on `output` but there could be a better solution. – crypticgamer Apr 29 '21 at 21:13
0

The data structure you are using (dicts with only 1 key!?) doesn't make a lot of sense, you should think about using a properly nested dict, or a single dict with tuple keys, or even a tree structure. But for what you requested you could use the following:

from collections import defaultdict
d = defaultdict(list)

for entry in original_list:
    print('entry', entry)
    key = []
    val = list(entry.values())[0]
    while type(entry) == dict:
        key.append(list(entry.keys())[0])
        entry = list(entry.values())[0]
    d[tuple(key)].append(entry)

out = []
for k in d:
    entry = {k[-1]: d[k]}
    for k0 in k[-2::-1]:
        entry = {k0: entry}
    out.append(entry)

print(out)
xjcl
  • 12,848
  • 6
  • 67
  • 89