4

I am trying to combine two nested Python dictionaries together. Each of them has 10 keys at the top level and then each of the 10 keys has 2 more keys: 'datetimes' and 'values'. At the low level each key of the nested dictionary has about 100 000 items.

The origin of 2 dictionaries is from 2 pkl files. I am unpickling those into 2 dictionaries using load function. Is there a way to have 1 dictionary from these 2 pkl files? If not, how can I combine the 2 dictionaries into one?

I have tried this solution but it overwrites one dictionary over another, and I couldn't get this solution to work as I have the dictionaries not the lists with indices as in the example. Using .copy() as suggested here also overwrites one dictionary over another. It would be great if I could just append one dictionary to another but this post seems to suggest that dictionaries don't work like that.

So I thought maybe I could create arrays out of these dictionaries and then reshape and concatenate them. But it is incredibly slow. Here is what I have so far:

import cPickle
import numpy as np

def load(filename, verbose=False):
    # Open file
    if verbose : print("Loading %s" % filename)
    pkl_file = open(filename, 'rb')
    # Load from Pickle file.
    data = cPickle.load(pkl_file)
    pkl_file.close()

    return data

def combineDicts(dictList):
    result = np.array([])
    for listItem in dictList:
        data = np.array([])
        for item in listItem.keys():
            for innerItem in listItem[item].keys():
                data = np.append(data, listItem[item][innerItem])
        result = np.append(result, data)

So I am trying to run these commands:

>>> dict1 = load('file1.pkl', verbose = True)
>>> dict2 = load('file2.pkl', verbose = True)
>>> a = combineDicts([dict1, dict2])
Community
  • 1
  • 1
Aina
  • 653
  • 2
  • 9
  • 22

1 Answers1

2

If I understand your issue correctly I think you can accomplish what you want using a dict comprehension (Version 3.x and 2.7):

>>> dict1 = {'topkey1': {'datetimes': [9,8], 'values': [7,6]}, 'topkey2': {'datetimes': [5,4], 'values': [3,2]}}
>>> dict2 = {'topkey3': {'datetimes': [9,8], 'values': [7,6]}, 'topkey4': {'datetimes': [5,4], 'values': [3,2]}}
>>> dictlist = [dict1, dict2]
>>>  new_dict = {key: value for item in dictlist for key, value in item.items()}
>>> new_dict
{'topkey4': {'values': [3, 2], 'datetimes': [5, 4]}, 'topkey1': {'values': [7, 6], 'datetimes': [9, 8]}, 'topkey3': {'values': [7, 6], 'datetimes': [9, 8]}, 'topkey2': {'values': [3, 2], 'datetimes': [5, 4]}}

If this isn't the result you're looking for please give examples of the initial dict structure and what you're looking for in the final structure of the dict.

Edit:

Based on the information you've provided in your comment the following should help:

>>> dict1 = {'topkey1': {'datetimes': [9,8], 'values': [7,6]}, 'topkey2': {'datetimes': [5,4], 'values': [3,2]}}
>>> dict2 = {'topkey1': {'datetimes': [29,28], 'values': [17,16]}, 'topkey2': {'datetimes': [35,34], 'values': [43,42]}}
>>> for key, value in dict2.items():
...     for subkey, subvalue in value.items():
...         dict1[key][subkey] = dict1[key][subkey] + subvalue
...    
>>> dict1
{'topkey1': {'values': [7, 6, 17, 16], 'datetimes': [9, 8, 29, 28]}, 'topkey2': {'values': [3, 2, 43, 42], 'datetimes': [5, 4, 35, 34]}}
sgallen
  • 2,079
  • 13
  • 10
  • sgallen, in the example you give topkey1 and topkey3 is reallt the same name so both dictionaries have the same identical top keys. The differences between the two is really one has data from one year and another has data from another year. So if I modify your example like this: `dict1 = {'topkey1': {'datetimes': [9,8], 'values': [7,6]}, 'topkey2': {'datetimes': [5,4], 'values': [3,2]}}` `dict2 = {'topkey1': {'datetimes': [29,28], 'values': [17,16]}, 'topkey2': {'datetimes': [35,34], 'values': [43,42]}}` – Aina Jan 14 '12 at 20:25
  • then output would be: `{'topkey1': {'datetimes': [9,8,29,28], 'values': [7,6,17,16]}, 'topkey2': {'datetimes': [5,4,35,34], 'values': [3,2,43,42]}}` I think dict comprehension is the way to go but my 5 weeks experience with Python can't quite get me there. I think your solution should work if I modify the comprehension somehow to reflect the output I am after. Thanks, Aina. – Aina Jan 14 '12 at 20:32