6

Given two lists of dictionaries, new one and old one. Dictionaries represent the same objects in both lists.
I need to find differences and produce new list of dictionaries where will be objects from new dictionaries only and updated attributes from old dictionaries.
Example:

   list_new=[
             { 'id':1,
               'name':'bob',
               'desc': 'cool guy'
              },
             
             { 'id':2,
               'name':'Bill',
               'desc': 'bad guy'
              },

              { 'id':3,
               'name':'Vasya',
               'desc': None
              },
        ]

    list_old=[
             { 'id':1,
               'name':'boby',
               'desc': 'cool guy',
                'some_data' : '12345'
              },
             { 'id':2,
               'name':'Bill',
               'desc': 'cool guy',
               'some_data' : '12345'

              },
              { 'id':3,
               'name':'vasya',
               'desc': 'the man',
               'some_data' : '12345'
              },
              { 'id':4,
               'name':'Elvis',
               'desc': 'singer',
               'some_data' : '12345'
              },
            ]
            

In that example I want produce new list where will be only new guys from list_new with updated data. Matched by id. So Bob will become Boby, Bill will become coll guy, Vasya become - the man. End Elvis have to be absent.

Give me an elegant solution. With less amount of iteration loops.

There is way, I resolve that. Which is not the best:

 def match_dict(new_list, old_list)
    ids_new=[]
    for item in new_list:
            ids_new.append(item['id'])
    result=[] 
    for item_old in old_medias:
        if item_old['id'] in ids_new:
            for item_new in new_list:
                if item_new['id']=item_old['id']
                    item_new['some_data']=item_old['some_data']
                    result.append(item_new)
    return result

The reason why I'm doubt, because there is loop inside loop. If there will be lists of 2000 items the process would take same time.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Pol
  • 24,517
  • 28
  • 74
  • 95
  • 1
    Are you retrieving this list from somewhere? Can you restructure the list of dictionaries using the __id__ as a key to a dictionary? – Mahmoud Abdelkader Mar 09 '11 at 21:51
  • I tried to use your code to diff your output with mine, but it doesn't work (syntax errors etc.). Please fix, thanks. – Paolo Mar 10 '11 at 00:18
  • The dictionaries going from mongodb. I'm trying to make it editable through django admin interface. I have typical django formset and dont want it to push each dict separately, It would make a lot of hits to database per one save in a page with formset. So i want to get it, match it, and than push with one hit. – Pol Mar 10 '11 at 05:29

9 Answers9

3

Can't quite get it to one line, but here's a simpler version:

def match_new(new_list, old_list) :
    ids = dict((item['id'], item) for item in new_list)
    return [ids[item['id']] for item in old_list if item['id'] in ids]
koblas
  • 25,410
  • 6
  • 39
  • 49
2

Not knowing the constraints of your data, I will suppose that id is unique in each list, and that your list contains only imutable types (string, int,...) which are hashable.

# first index each list by id
new = {item['id']: item for item in list_new}
old = {item['id']: item for item in list_old}

# now you can see which ids appeared in the new list
created = set(new.keys())-set(old.keys())
# or which ids were deleted
deleted =  set(old.keys())-set(new.keys())
# or which ids exists in the 2 lists
intersect = set(new.keys()).intersection(set(old.keys()))

# using the same 'conversion to set' trick,
# you can see what is different for each item
diff = {id: dict(set(new[id].items())-set(old[id].items())) for id in intersect}

# using your example data set, diff now contains the differences for items which exists in the two lists:
# {1: {'name': 'bob'}, 2: {'desc': 'bad guy'}, 3: {'name': 'Vasya', 'desc': None}}

# you can now add the new ids to this diff
diff.update({id: new[id] for id in created})
# and get your data back into the original format:
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]

this is using python 3 syntax, but should be easily ported to python 2.

edit: here is the same code written for python 2.5:

new = dict((item['id'],item) for item in list_new)
old = dict((item['id'],item) for item in list_old)

created = set(new.keys())-set(old.keys())
deleted =  set(old.keys())-set(new.keys())
intersect = set(new.keys()).intersection(set(old.keys()))

diff = dict((id,dict(set(new[id].items())-set(old[id].items()))) for id in intersect)

diff.update(dict(id,new[id]) for id in created))
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]

(note how the code is less readable without the dict comprehension)

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Adrien Plisson
  • 22,486
  • 6
  • 42
  • 73
  • It's pretty good. There is 5 loops. But x*5 is less than x*x. if x sometime can equal 300. Thanks. – Pol Mar 10 '11 at 14:27
1

for each dictionary in old_list, search for the dictionary in new_list with the same id, then do: old_dict.update(new_dict)

eliminate each new_dict, after updating, from new_list and append the remaining, unused dicts after the loop.

joaquin
  • 82,968
  • 29
  • 138
  • 152
1

Something like this is what you need:

l = []
for d in list_old:
    for e in list_new:
        if e['id'] == d['id']:
            l.append(dict(e, **d))
print l

Read here on how to merge dictionaries.

Community
  • 1
  • 1
atx
  • 4,831
  • 3
  • 26
  • 40
1

You could do something like this:

def match_dict(new_list, old_list):
    new_dict = dict((obj['id'], obj) for obj in new_list)
    old_dict = dict((obj['id'], obj) for obj in old_list)
    for k in new_dict.iterkeys():
        if k in old_dict:
            new_dict[k].update(old_dict[k])
        else:
            del new_dict[k]
    return new_dict.values()

If you are doing this often I would suggest storing your data as dictionaries with the id as the key instead of lists, that way you wouldn't have to convert it each time.

edit: Here is an example showing how to store the data in a dictionary.

list_new = [{'desc': 'cool guy', 'id': 1, 'name': 'bob'}, {'desc': 'bad guy', 'id': 2, 'name': 'Bill'}, {'desc': None, 'id': 3, 'name': 'Vasya'}]
# create a dictionary with the value of 'id' as the key
dict_new = dict((obj['id'], obj) for obj in list_new)
# now you can access entries by their id instead of having to loop through the list
print dict_new[2]
# {'id': 2, 'name': 'Bill', 'desc': 'bad guy'}
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • What do you mean dictionaries as the key? Can I have some doc link? Or see some example? – Pol Mar 10 '11 at 14:59
1

Steps:

  • Create a look up dictionary for list_old by id
  • Loop through list_new dicts creating a merged dict for each if it existed in old

Code:

def match_dict(new_list, old_list): 
    old = dict((v['id'], v) for v in old_list)
    return [dict(d, **old[d['id']]) for d in new_list if d['id'] in old]

EDIT: incorrectly named variables inside function.

kevpie
  • 25,206
  • 2
  • 24
  • 28
  • I like that solution. Beautiful. But it already was given by koblas. Thanks. – Pol Mar 10 '11 at 15:21
  • Only one problem is here that it dont save new objects. which can come as well. But I did not mentioned about it. – Pol Mar 10 '11 at 17:50
  • Just for reference this function doesn't return a result that matches the original match_dict() function. Since it has the lists reversed. – koblas Mar 14 '11 at 16:40
  • @koblas are you referring to the typos? new_list/list_new old_list/list_old. Those are a mistake. Thank you for pointing that out. – kevpie Mar 15 '11 at 17:27
0

You'd be much better off if your top-level data structure was a dict rather than a list. Then it would be:

dict_new.update(dict_old)

However, for what you actually have, try this:

result_list = []
for item in list_new:
    found_item = [d for d in list_old if d["id"] == item["id"]]
    if found_item:
        result_list.append(dict(item, **found_item[0]))

This actually still has a loop inside a loop (the inner loop is "hidden" in the list comprehension) so it's still O(n**2). On large data sets it would undoubtedly be noticeably faster to convert it to a dict, update that, and then convert it back to a list.

kindall
  • 178,883
  • 35
  • 278
  • 309
0

You could like this one:

def match_dict(new_list, old_list):
    id_new = [item_new.get("id") for item_new in list_new]
    id_old = [item_old.get("id") for item_old in list_old]

    for idx_old in id_old:
        if idx_old in id_new:
            list_new[id_new.index(idx_old)].update(list_old[id_old.index(idx_old)])

    return list_new

from pprint import pprint
pprint(match_dict(list_new, list_old))

Output:

[{'desc': 'cool guy', 'id': 1, 'name': 'boby', 'some_data': '12345'},
 {'desc': 'cool guy', 'id': 2, 'name': 'Bill', 'some_data': '12345'},
 {'desc': 'the man', 'id': 3, 'name': 'vasya', 'some_data': '12345'}]
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Paolo
  • 20,112
  • 21
  • 72
  • 113
0
[od for od in list_old if od['id'] in {nd['id'] for nd in list_new}]
Vamana
  • 580
  • 2
  • 7