Merging multiple dictionaries with inconsistent keys

Question

I'm a Python beginner and struggling with the following:

I'm attempting to merge multiple lists with nested dictionaries that I've decoded from multiple jsons. The common thread between the lists is the "uid" key for each nested dict corresponding to a name, but the problem is that some dicts have different names for the keys. For example, instead of "uid", a dict may have "number" as the key. I'd like to merge pieces of them together into a super nested-dictionary list of sorts. To illustrate, what I have is:

masterlist = [ ]

listA = [{"uid": "12345", "name": "John Smith"}, {etc...}]

listB = [{"number": "12345", "person": "John Smith", "val1": "25"}, {etc...}]

listC = [{"number": "12345", "person": "John Smith", "val2": "65"}, {etc...}]

What I'd like to end up with is:

masterlist = [{"uid": "12345", "name": "John Smith", "val1": "25", "val2: "65"}, {etc...}]

Is this possible to do efficiently/pythonically by iterating through and comparing for the identical "uid" value? I've seen a lot of how-tos on merging by matching keys but problem here obviously is the keys are not consistent. Sorting doesn't matter. All I need is for the master list to contain the corresponding uid, name, and values for each dict entry. Hopefully that makes sense and thank you!

I should maybe clarify, as displayed above, I'm fine with retaining the key:value pairs for listA. So in that sense what I'm asking is how to append each nested dictionary in listA with val1 in listB and val2 in listC, matching based on the common "uid/number" value. — thruxman, Mar 01 '19 at 18:55

score 2 · Accepted Answer · answered Mar 01 '19 at 19:05

There are probably solutions using base python, but simplest way I can think of is to use the pandas library to convert each list to a DataFrame, then join/merge them together.

import pandas as pd

dfA = pd.DataFrame(listA)
dfB = pd.DataFrame(listB)

merged_df = dfA.merge(dfB, left_on='uid', right_on='number')

That would return a DataFrame with more columns than you need (i.e. there would be columns for both "uid" and "number"), but you could specify which ones you want and the order you want them this way:

merged_df = merged_df[['uid', 'name', 'val1']]

For merging multiple DataFrames into one master frame, see here: pandas three-way joining multiple dataframes on columns

Awesome, this got me to where I needed to go. Thanks! – thruxman Mar 02 '19 at 22:16 — thruxman, Mar 02 '19 at 22:16

blhsing · Answer 2 · 2019-03-01T20:07:08.027

You should put all your input lists in a list of lists, so that you can construct a dict that maps uid to a dict with aggregated item values, so that your desired list of dicts would be simply the dict values of the mapping. To allow for inconsistent naming of the key in different input dicts, pop the ones you don't want (such as number and id in my example) and assign to the dict with the key you want to keep (such as uid in the example):

wanted_key = 'uid'
unwanted_keys = {'number', 'id'}
mapping = {}
for l in lists:
    for d in l:
        if wanted_key not in d:
            d[wanted_key] = d.pop(unwanted_keys.intersection(d).pop())
        mapping.setdefault(d[wanted_key], {}).update(d)
masterlist = list(mapping.values())

so that given:

lists = [
    [
        {"uid": "12345", "name": "John Smith"},
        {"uid": "56789", "name": "Joe Brown", "val1": "1"}
    ],
    [
        {"number": "12345", "name": "John Smith", "val1": "25"},
        {"number": "56789", "name": "Joe Brown", "val2": "2"}
    ],
    [
        {"id": "12345", "name": "John Smith", "val2": "65"}
    ]
]

masterlist becomes:

[
    {'uid': '12345', 'name': 'John Smith', 'val1': '25', 'val2': '65'},
    {'uid': '56789', 'name': 'Joe Brown', 'val1': '1', 'val2': '2'}
]

Will this work though if the "uid" key name is not consistent across all lists though? In my case, the values "12345" are consistent across all lists but the keys have different naming conventions ("uid", "name", etc.). — thruxman, Mar 01 '19 at 19:04

score 0 · Answer 3 · answered Mar 01 '19 at 19:16

If you need to use different keys for each list, here is a solution that also uses an intermediate dict, with a function that takes the key representing uid and one or more keys to copy:

people_by_uid = {person["uid"]: person for person in listA}

def update_values(listX, uid_key, *val_keys):
    for entry in listX:
        person = people_by_uid[entry[uid_key]]
        for val_key in val_keys:
            person[val_key] = entry[val_key]

update_values(listB, "number", "val1")
update_values(listC, "number", "val2")

# e.g. if you had a listD from which you also needed val3 and val4:
update_values(listD, "number", "val3", "val4")

masterlist = [person for person in people_by_uid.values()]

score 0 · Answer 4 · answered Jun 09 '19 at 17:51

You can do this without Pandas using a list comprehension that builds a dictionary of dictionaries to group the list's dictionaries by their "uid". You then take the .values() of that grouping dictionary to get a list of dictionaries again:

listA = [{"uid": "12345", "name": "John Smith"},{"uid": "67890", "name": "Jane Doe"}]

listB = [{"number": "12345", "person": "John Smith", "val1": "25"},{"number": "67890", "val1": "37"}]

listC = [{"number": "12345", "person": "John Smith", "val2": "65"},{"number": "67890", "val2": "53"}]

from collections import defaultdict
fn     = { "number":"uid", "person":"name" } # map to get uniform key names
data   = [ { fn.get(k,k):v for k,v in d.items() } for d in listA+listB+listC ]
result = next(r for r in [defaultdict(dict)] if [r[d["uid"]].update(d) for d in data])
print(*result.values())

{'uid': '12345', 'name': 'John Smith', 'val1': '25', 'val2': '65'} 
{'uid': '67890', 'name': 'Jane Doe', 'val1': '37', 'val2': '53'}

Merging multiple dictionaries with inconsistent keys

4 Answers4