0

Say that I have a list of dicts:

list = [{'name':'john','age':'28','location':'hawaii','gender':'male'},
        {'name':'john','age':'32','location':'colorado','gender':'male'},
        {'name':'john','age':'32','location':'colorado','gender':'male'},
        {'name':'parker','age':'24','location':'new york','gender':'male'}]

In this dict, 'name' can be considered a unique identifier. My goal is to not only dedup this list for identical dicts (ie list[1] and list[2], but to also merge/append differing values for a single 'name' (ie list[0] and list[1/2]. In other words, I want to merge all of the 'name'='john' dicts in my example to a single dict, like so:

dedup_list = [{'name':'john','age':'28; 32','location':'hawaii; colorado','gender':'male'},
              {'name':'parker','age':'24','location':'new york','gender':'male'} ]

I have tried thus far to create my second list, dedup_list, and to iterate through the first list. If the 'name' key does not already exist in one of dedup_list's dicts, I will append it. It is the merging part where I am stuck.

for dict in list:
    for new_dict in dedup_list:
        if dict['name'] in new_dict:
            # MERGE OTHER DICT FIELDS HERE
        else:
            dedup_list.append(dict) # This will create duplicate values as it iterates through each row of the dedup_list.  I can throw them in a set later to remove?

My list of dicts will never contain more than 100 items, so an O(n^2) solution is definitely acceptable but not necessarily ideal. This dedup_list will eventually be written to a CSV, so if there is a solution involving that, I am all ears.

Thanks!

MTP
  • 387
  • 1
  • 3
  • 8
  • 1
    Have you tried something ? This is very close to http://stackoverflow.com/questions/5946236/how-to-merge-multiple-dicts-with-same-key or http://stackoverflow.com/questions/13718558/merging-3-dicts-in-python or even http://stackoverflow.com/questions/9415785/merging-several-python-dictionaries – hivert Mar 12 '14 at 17:58
  • 1
    what did you try so far? have you had any problems working on an algorithm to do that? are you aware that Stack Overflow is a site to get *actual* problems solved, not work done for you? – zmo Mar 12 '14 at 17:58
  • 1
    you'll loose the information as to which age has the one that lives in colorado. (in your example, luckily no, but change the age of the 3rd john to 42, you'll see that you have no way of knowing if john 32 is from hawaii or colorado.) – njzk2 Mar 12 '14 at 18:00
  • Yes, yes...you are right zmo. I will edit my question. My deepest apologies. – MTP Mar 12 '14 at 18:02
  • That's alright, njzk2, age --> location mapping does not matter in this instance. – MTP Mar 12 '14 at 18:12

1 Answers1

2

well, I was about to craft a solution around defaultdict, but hopefully @hivert posted the best solution I could came with, which is in this answer:

from collections import defaultdict

dicts = [{'a':1, 'b':2, 'c':3},
         {'a':1, 'd':2, 'c':'foo'},
         {'e':57, 'c':3} ]

super_dict = defaultdict(set)  # uses set to avoid duplicates

for d in dicts:
    for k, v in d.iteritems():
        super_dict[k].add(v)

i.e. I'm voting for closing this question as a dupe of that question.

N.B.: you won't be getting values such as '28; 32', but instead get a set containing [28,32], which then can be processed into a csv file as you wish.

N.B.2: to write the csv file, have a look at the DictWriter class

Community
  • 1
  • 1
zmo
  • 24,463
  • 4
  • 54
  • 90
  • 1
    Thank you for pointing me in that direction. I will go ahead and close the question. – MTP Mar 12 '14 at 18:14