What is the most efficient way to create nested dictionaries in Python?

Question

I currently have over 10k elements in my dictionary looks like:

cars = [{'model': 'Ford', 'year': 2010},
        {'model': 'BMW', 'year': 2019},
        ...]

And I have a second dictionary:

car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34},
              {'model': 'BMW', 'name': 'Taylor', 'age': 34},
              .....]

However, I want to join together the 2 together to be something like:

combined = [{'model': 'BMW',
             'year': 2019,
             'owners: [{'name': 'Sam', 'age': 34}, ...]
            }]

What is the best way to combine them? For the moment I am using a For loop but I feel like there are more efficient ways of dealing with this.

** This is just a fake example of data, the one I have is a lot more complex but this helps give the idea of what I want to achieve

I think the question is double. https://stackoverflow.com/questions/53601657/combine-multiple-dictionaries-into-one-pandas-dataframe-in-long-format — Shazter, Sep 10 '20 at 12:39
These are no dictionaries, but lists (of dictionaries). If at least one was a real dict, merging would be faster, because you wouldn't have to "search" for the matching model. — Wups, Sep 10 '20 at 12:50
For this amount of data, you'll want to start thinking about using something like SQLite (or some other database). You won't have the memory overhead of many `dict` objects, and you can use SQL to generate the desired combination. — chepner, Sep 10 '20 at 12:58

DirtyBit · Answer 1 · 2020-09-10T12:52:20.393

Iterate over the first list, creating a dict with the key-val as model-val, then in the second dict, look for the same key (model) and update the first dict, if it is found:

cars = [{'model': 'Ford', 'year': 2010}, {'model': 'BMW', 'year': 2019}]
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34}, {'model': 'Ford', 'name': 'Taylor', 'age': 34}]


dd = {x['model']:x for x in cars}

for item in car_owners:
    key = item['model']
    if key in dd:
        del item['model']
        dd[key].update({'car_owners': item})
    else:
        dd[key] = item

print(list(dd.values()))

OUTPUT:

[{'model': 'BMW', 'year': 2019, 'car_owners': {'name': 'Sam', 'age': 34}}, {'model': 'Ford', 'year': 2010, 'car_owners': {'name': 'Taylor', 
'age': 34}}]

maor10 · Answer 2 · 2020-09-10T13:03:13.377

Really, what you want performance wise is to have dictionaries with the model as the key. That way, you have O(1) lookup and can quickly get the requested element (instead of looping each time in order to find the car with model x). If you're starting off with lists, I'd first create dictionaries, and then everything is O(1) from there on out.

models_to_cars = {car['model']: car for car in cars}
models_to_owners = {}
for car_owner in car_owners:
    models_to_owners.setdefault(car_owner['model'], []).append(car_owner)


combined = [{
    **car,
    'owners': models_to_owners.get(model, [])
} for model, car in models_to_cars.items()]

Then you'd have

combined = [{'model': 'BMW',
             'year': 2019,
             'owners': [{'name': 'Sam', 'age': 34}, ...]
            }]

as you wanted

What is the most efficient way to create nested dictionaries in Python?

2 Answers2