Union Multiple Nested JSONs in Python

Question

I have multiple json files containing relational data which I need to merge , each of the file has a record with commonkey which is common key in all files, in the example below a0 ,a1 are common keys.The value is a nested dictionary of multiple keys like Key1, key2 etc as shown below, I need to merge the multiple json files and get the output as shown in the dboutput.json, with the file name acting as the index in the merging operation. Such question is a related question which merges loosing information, but in my case I dont want any updation which replaces the existing keys or skips updations, in case of hitting an existing key another nested dictionary indexed by the filename is created as shown below:

Example:

File db1.json:


"a0": {
        "commonkey": [
            "a1", 
            "parentkeyvalue1"
        ], 
        "key1": "kvalue1", 
        "key2": "kvalue2"
        "keyp": "kvalue2abc"

    }, 
"a1": { 
...
}

File db2.json:


"a0": {
        "commonkey": [
            "a1", 
            "parentkeyvalue1"
        ], 
        "key1": "kvalue1xyz", 
        "key2": "kvalue2",
        "key3": "kvalue2"



    }, 

"a1": { 
...
}

Desired Output

File dboutput.json

"a0": {
        "commonkey": [
            "a1", 
            "parentkeyvalue1"
        ], 
        "key1": {"db1":"kvalue1","db2":"kvalue1xyz"} ,
        "key2": {"db1":"kvalue2","db2":"kvalue2"} ,
        "key3": {"db2":"kvalue2"}
        "keyp": {"db1":"kvalue2abc"}



    }, 
"a1": { 
...
}

So how to do such lossless merges? Note "key2": {"db1":"kvalue2","db2":"kvalue2"} even if the key\value pairs are same they need to be stored separately. In effect the output is a union of all input files and has all entries from all other files.

Also

"commonkey": [
            "a1", 
            "parentkeyvalue1"
        ],

will be same for all files and hence need not be repeated

You treat the keys `"a0"` and `"commonkey"` differently from `"key2"` — Janne Karila, Oct 02 '13 at 11:33
I'm pointing out that you don't have `"commonkey": {"db1": ["a1","parentkeyvalue1"], "db2": ["a1","parentkeyvalue1"]}` in the desired output. How shall the program know which keys to merge and which not? — Janne Karila, Oct 02 '13 at 12:45
@JanneKarila its because commonkey is realy common and is not to be repeated, it is same for all the files — stackit, Oct 02 '13 at 12:52

score 2 · Accepted Answer · answered Oct 02 '13 at 14:09

I finally managed to get it:

class NestedDict(collections.OrderedDict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

def mergejsons(jsns):
 ##use auto vification Nested Dict
    op=nesteddict.NestedDict()
    for j in jsns:
        jdata=json.load(open(j))
        jname=j.split('.')[0][-2:]
        for commnkey,val in jdata.items():
            for k,v in val.items():
                if k!='commonkey':
                    op[commnkey][k][jname]=v
                if  op[commnkey].has_key('commonkey'):
                    continue
                else:
                    op[commnkey][k][jname]=v

score 1 · Answer 2 · answered Oct 02 '13 at 13:28

A simple solution is to iterate through each JSON object, and add dictionary pairs in each "commonkey" as you see them. Here's an example where you load each JSON file in a list, and then iteratively merge them.

#!/usr/bin/python
import json

# Hardcoded list of JSON files
dbs = [ "db1.json", "db2.json" ]
output = dict() # stores all the merged output

for db in dbs:
    # Name the JSON obj and load it 
    db_name = db.split(".json")[0]
    obj = json.load(open(db))

    # Iterate through the common keys, adding them only if they're new
    for common_key, data in obj.items():
        if common_key not in output:
            output[common_key] = dict(commonkey=data["commonkey"])

        # Within each common key, add key, val pairs 
        # subindexed by the database name
        for key, val in data.items():
            if key != "commonkey":
                if key in output[common_key]:
                    output[common_key][key][db_name] = val
                else:
                    output[common_key][key] = {db_name: val}


# Output resulting json to file
open("dboutput.json", "w").write( 
    json.dumps( output, sort_keys=True, indent=4, separators=(',', ': ') )
)

+1 thanks , I have already solved similarly using nested dict with vivification — stackit, Oct 02 '13 at 14:06

Union Multiple Nested JSONs in Python

2 Answers2