1

I have two lists of nested dictionaries:

lofd1 = [{'A': {'facebook':{'handle':'https://www.facebook.com/pages/New-Jersey/108325505857259','logo_id': None}, 'contact':{'emails':['nj@nj.gov','state@nj.gov']},'state': 'nj', 'population':'12345', 'capital':'Jersey','description':'garden state'}}]
lofd2 = [{'B':{'building_type':'ranch', 'city':'elizabeth', 'state':'nj', 'description':'the state close to NY'}}]

I need to:

  • Merge similar dictionaries in the lists, using the value of the 'state' key (for example, merge all dictionaries where "state" = "nj" into a single dictionary
  • It should include key/value combinations that are present in both dictionaries once (for example, "state" for both should be "nj")
  • It should include key/value combinations, that are not present in one of the dictionaries (for exmaple, "population", "capital" from the lofd1 and "building_type", "city" from lofd2).
  • Some of the values in dictionaries should be excluded, for example, 'logo_id':None
  • Put values in "description" from both dictionaries into a list of strings, for example '"description" : ['garden state', 'the state close to NY']'

The final dataset should look like this:

lofd_final = [{'state': 'nj', 'facebook':{'handle':'https://www.facebook.com/pages/New-Jersey/108325505857259'},'population':'12345', 'capital':'Jersey', 'contact':{'emails':['nj@nj.gov','state@nj.gov']}, 'description': ['garden state','the state close to NY'],'building_type':'ranch', 'city':'elizabeth'}]

What would be an efficient solution?

Feyzi Bagirov
  • 1,292
  • 4
  • 28
  • 46
  • 1
    Have you had a look at any solutions proposed here: https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression? Might help with what you are trying to achieve – Jesse Sep 10 '18 at 05:21
  • @Jesse I looked at a similar solutions, the problem is that I have a list of dictionaries, not a stand-alone dictionary – Feyzi Bagirov Sep 10 '18 at 05:27
  • what is the possible structure of the dictionaries. Does each dictonary only have a single top level key like in your example? Or can a single dicitonary have multiple keys e.g. 'A', 'B', 'C' – Tom Sep 10 '18 at 05:31
  • @the-realtom lofd2 has a single top level, lofd1 has a few nested dictionaries and lists inside the dictionaries (I updated the example) – Feyzi Bagirov Sep 10 '18 at 05:44

1 Answers1

0

This is a solution very specific to your case. In terms of time complexity it is; O(n*m), n being the number of dicionaries in a list and m being the number of keys in a dictionary. You only ever look at each key in each dictionary once.

def extract_data(lofd, output):
    for d in lofd:
        for top_level_key in d: # This will be the A or B key from your example
            data = d[top_level_key] 
            state = data['state']
            if state not in output: # Create the state entry for the first time
                output[state] = {}
            # Now update the state entry with the data you care about
            for key in data:
                # Handle descriptions
                if key == 'description':
                    if 'description' not in output[state]:
                        output[state]['description'] = [data['description']]
                    else:
                        output[state]['description'].append(data['description'])
                # Handle all other keys
                else:
                    # Handle facebook key (exclude logo_id)
                    if key == 'facebook':
                        del data['facebook']['logo_id']
                    output[state][key] = data[key]

output = {}
extract_data(lofd1, output)
extract_data(lofd2, output)
print(list(output.values()))

The output will be a dict of dicts, with the top level keys as the states. To convert it to how you specified just extract the values into a flat list: list(output.values()) (see above example).

Note: I am assuming a deep copy is not needed. So after you extract the data, I'm assuming you don't go and manipulate the values in lofd1 and lofd2. Also this is purely based on the specs that were given, e.g. if there are more nested keys that need to be excluded, you will need to add extra filters yourself.

Tom
  • 1,636
  • 2
  • 13
  • 21