Match set of dictionaries. Most elegant solution. Python

Question

Given two lists of dictionaries, new one and old one. Dictionaries represent the same objects in both lists.
I need to find differences and produce new list of dictionaries where will be objects from new dictionaries only and updated attributes from old dictionaries.
Example:

   list_new=[
             { 'id':1,
               'name':'bob',
               'desc': 'cool guy'
              },
             
             { 'id':2,
               'name':'Bill',
               'desc': 'bad guy'
              },

              { 'id':3,
               'name':'Vasya',
               'desc': None
              },
        ]

    list_old=[
             { 'id':1,
               'name':'boby',
               'desc': 'cool guy',
                'some_data' : '12345'
              },
             { 'id':2,
               'name':'Bill',
               'desc': 'cool guy',
               'some_data' : '12345'

              },
              { 'id':3,
               'name':'vasya',
               'desc': 'the man',
               'some_data' : '12345'
              },
              { 'id':4,
               'name':'Elvis',
               'desc': 'singer',
               'some_data' : '12345'
              },
            ]

In that example I want produce new list where will be only new guys from list_new with updated data. Matched by id. So Bob will become Boby, Bill will become coll guy, Vasya become - the man. End Elvis have to be absent.

Give me an elegant solution. With less amount of iteration loops.

There is way, I resolve that. Which is not the best:

 def match_dict(new_list, old_list)
    ids_new=[]
    for item in new_list:
            ids_new.append(item['id'])
    result=[] 
    for item_old in old_medias:
        if item_old['id'] in ids_new:
            for item_new in new_list:
                if item_new['id']=item_old['id']
                    item_new['some_data']=item_old['some_data']
                    result.append(item_new)
    return result

The reason why I'm doubt, because there is loop inside loop. If there will be lists of 2000 items the process would take same time.

Are you retrieving this list from somewhere? Can you restructure the list of dictionaries using the __id__ as a key to a dictionary? — Mahmoud Abdelkader, Mar 09 '11 at 21:51
I tried to use your code to diff your output with mine, but it doesn't work (syntax errors etc.). Please fix, thanks. — Paolo, Mar 10 '11 at 00:18
The dictionaries going from mongodb. I'm trying to make it editable through django admin interface. I have typical django formset and dont want it to push each dict separately, It would make a lot of hits to database per one save in a page with formset. So i want to get it, match it, and than push with one hit. — Pol, Mar 10 '11 at 05:29

score 3 · Answer 1 · answered Mar 09 '11 at 22:08

3

Can't quite get it to one line, but here's a simpler version:

def match_new(new_list, old_list) :
    ids = dict((item['id'], item) for item in new_list)
    return [ids[item['id']] for item in old_list if item['id'] in ids]

answered Mar 09 '11 at 22:08

koblas

25,410
6
39
49

score 2 · Answer 2 · edited Jun 15 '21 at 11:11

Not knowing the constraints of your data, I will suppose that id is unique in each list, and that your list contains only imutable types (string, int,...) which are hashable.

# first index each list by id
new = {item['id']: item for item in list_new}
old = {item['id']: item for item in list_old}

# now you can see which ids appeared in the new list
created = set(new.keys())-set(old.keys())
# or which ids were deleted
deleted =  set(old.keys())-set(new.keys())
# or which ids exists in the 2 lists
intersect = set(new.keys()).intersection(set(old.keys()))

# using the same 'conversion to set' trick,
# you can see what is different for each item
diff = {id: dict(set(new[id].items())-set(old[id].items())) for id in intersect}

# using your example data set, diff now contains the differences for items which exists in the two lists:
# {1: {'name': 'bob'}, 2: {'desc': 'bad guy'}, 3: {'name': 'Vasya', 'desc': None}}

# you can now add the new ids to this diff
diff.update({id: new[id] for id in created})
# and get your data back into the original format:
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]

this is using python 3 syntax, but should be easily ported to python 2.

edit: here is the same code written for python 2.5:

new = dict((item['id'],item) for item in list_new)
old = dict((item['id'],item) for item in list_old)

created = set(new.keys())-set(old.keys())
deleted =  set(old.keys())-set(new.keys())
intersect = set(new.keys()).intersection(set(old.keys()))

diff = dict((id,dict(set(new[id].items())-set(old[id].items()))) for id in intersect)

diff.update(dict(id,new[id]) for id in created))
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]

(note how the code is less readable without the dict comprehension)

It's pretty good. There is 5 loops. But x*5 is less than x*x. if x sometime can equal 300. Thanks. — Pol, Mar 10 '11 at 14:27

score 1 · Answer 3 · answered Mar 09 '11 at 21:53

1

for each dictionary in old_list, search for the dictionary in new_list with the same id, then do: old_dict.update(new_dict)

eliminate each new_dict, after updating, from new_list and append the remaining, unused dicts after the loop.

answered Mar 09 '11 at 21:53

joaquin

82,968
29
138
152

score 1 · Answer 4 · edited May 23 '17 at 12:19

1

Something like this is what you need:

l = []
for d in list_old:
    for e in list_new:
        if e['id'] == d['id']:
            l.append(dict(e, **d))
print l

Read here on how to merge dictionaries.

edited May 23 '17 at 12:19

Community

1
1

answered Mar 09 '11 at 21:59

atx

4,831
3
26
40

Andrew Clark · Answer 5 · 2011-03-10T17:05:51.757

You could do something like this:

def match_dict(new_list, old_list):
    new_dict = dict((obj['id'], obj) for obj in new_list)
    old_dict = dict((obj['id'], obj) for obj in old_list)
    for k in new_dict.iterkeys():
        if k in old_dict:
            new_dict[k].update(old_dict[k])
        else:
            del new_dict[k]
    return new_dict.values()

If you are doing this often I would suggest storing your data as dictionaries with the id as the key instead of lists, that way you wouldn't have to convert it each time.

edit: Here is an example showing how to store the data in a dictionary.

list_new = [{'desc': 'cool guy', 'id': 1, 'name': 'bob'}, {'desc': 'bad guy', 'id': 2, 'name': 'Bill'}, {'desc': None, 'id': 3, 'name': 'Vasya'}]
# create a dictionary with the value of 'id' as the key
dict_new = dict((obj['id'], obj) for obj in list_new)
# now you can access entries by their id instead of having to loop through the list
print dict_new[2]
# {'id': 2, 'name': 'Bill', 'desc': 'bad guy'}

What do you mean dictionaries as the key? Can I have some doc link? Or see some example? — Pol, Mar 10 '11 at 14:59

kevpie · Accepted Answer · 2011-03-15T17:30:01.550

1

Steps:

Create a look up dictionary for list_old by id
Loop through list_new dicts creating a merged dict for each if it existed in old

Code:

def match_dict(new_list, old_list): 
    old = dict((v['id'], v) for v in old_list)
    return [dict(d, **old[d['id']]) for d in new_list if d['id'] in old]

EDIT: incorrectly named variables inside function.

edited Mar 15 '11 at 17:30

answered Mar 10 '11 at 04:42

kevpie

25,206
2
24
28

I like that solution. Beautiful. But it already was given by koblas. Thanks. – Pol Mar 10 '11 at 15:21
Only one problem is here that it dont save new objects. which can come as well. But I did not mentioned about it. – Pol Mar 10 '11 at 17:50
Just for reference this function doesn't return a result that matches the original match_dict() function. Since it has the lists reversed. – koblas Mar 14 '11 at 16:40
@koblas are you referring to the typos? new_list/list_new old_list/list_old. Those are a mistake. Thank you for pointing that out. – kevpie Mar 15 '11 at 17:27

kindall · Answer 7 · 2011-03-09T22:17:08.920

You'd be much better off if your top-level data structure was a dict rather than a list. Then it would be:

dict_new.update(dict_old)

However, for what you actually have, try this:

result_list = []
for item in list_new:
    found_item = [d for d in list_old if d["id"] == item["id"]]
    if found_item:
        result_list.append(dict(item, **found_item[0]))

This actually still has a loop inside a loop (the inner loop is "hidden" in the list comprehension) so it's still O(n**2). On large data sets it would undoubtedly be noticeably faster to convert it to a dict, update that, and then convert it back to a list.

score 0 · Answer 8 · edited Jun 15 '21 at 11:12

You could like this one:

def match_dict(new_list, old_list):
    id_new = [item_new.get("id") for item_new in list_new]
    id_old = [item_old.get("id") for item_old in list_old]

    for idx_old in id_old:
        if idx_old in id_new:
            list_new[id_new.index(idx_old)].update(list_old[id_old.index(idx_old)])

    return list_new

from pprint import pprint
pprint(match_dict(list_new, list_old))

Output:

[{'desc': 'cool guy', 'id': 1, 'name': 'boby', 'some_data': '12345'},
 {'desc': 'cool guy', 'id': 2, 'name': 'Bill', 'some_data': '12345'},
 {'desc': 'the man', 'id': 3, 'name': 'vasya', 'some_data': '12345'}]

score 0 · Answer 9 · answered Mar 10 '11 at 00:31

0

[od for od in list_old if od['id'] in {nd['id'] for nd in list_new}]

answered Mar 10 '11 at 00:31

Vamana

580
2
7

This one does not update additional data which come with new dict. – Pol Mar 10 '11 at 07:33

Match set of dictionaries. Most elegant solution. Python

9 Answers9

Linked