Is there a good, controllable way to compare two .json files and generate their difference into an Excel Sheet in Python

Question

I have tried using jsondiff but the output was not upto my requirements.

with open('C/abc/file1.json') as f:
        a= json.load(f)
    
    with open('C/abc/file2.json') as s:
        b= json.load(s)
    c= diff(a,b)

What I want to print in Excel sheet is Delta, the things which are present in file1 but not in file2, present in file1 but changed in file2, newly added things in file2 that are not in file1 and if possible then line number too. So if anyone got any idea how to achieve this, then please share it and if need more clarifications then plzz let me know.

I cannot paste the original content of file due to limitations but how the content inside those file look i have put below.

file1.json

    {
  "Indicator": {
    "key1": "value 1",
    "key2": "value 2",
    "Name": "value 3",
    "key4": "value 4",
    "Description": "some text",
    "Subformulas": {
      "some key": "some sub-formula"
    },
    "Formula": "some formula"
  }
}

file2.json

    {
  "Indicator": {
    "key1": "value 1",
    "key2": "value 2",
    "key3":"value changed",
    "Name": "value 3",
    "key4": "value 4",
    "Description": "some text",
    "Subformulas": {
      "some key": "change in some sub-formula"
    },
    "Formula": "some formula"
  }
}

I have used this for printing difference in Excel sheet.. code for which I've found on an answer in one of the questions in Stack Overflow, but it is not printing difference into excel sheet...on terminal it is showing difference but in excel sheet it is not. So i think i am doing something wrong and I got no idea how to correct that.

c.items()
c1=list(c.items())
workbook = xlsxwriter.Workbook('myfile.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0

order=sorted(c.keys())

for key in order:
    row += 1
    #print(key)
    worksheet.write(row,    col,     key)
    for item in c[key]:
        #print(item,row, col+1)
        worksheet.write(row, col + 1, item)
        col += 1
    col = 0

workbook.close()

What's the structure of the json files? Can you post data samples? — RJ Adriaansen, Oct 10 '21 at 21:16
Requesting libraries/software is specifically off-topic on StackOverflow. Show your honest attempt to get the outut you need as code in your question - i.e. _not_ just the code you’re saying is inadequate. — DisappointedByUnaccountableMod, Oct 10 '21 at 21:16
`difflib` is in the standard library so it's still on-topic i guess — diggusbickus, Oct 10 '21 at 21:28
@RJAdriaansen I have updated the question, sorry but I cannot paste the original content of .json files due to regulations, I just removed the content but the format is same. — Master, Oct 11 '21 at 09:01
@balmy I think I am not asking anything off-topic here, it's just when I am trying to print into excel sheet there I am having problems...I have tried to do research about xlsxwriter everywhere but I can't quite understand anything on how to use it — Master, Oct 11 '21 at 09:08

DisappointedByUnaccountableMod · Accepted Answer · 2021-10-11T18:33:44.777

If you read the source code of jsondiff https://github.com/xlwings/jsondiff/tree/05dd7dd6c0b712fe54491289d3972ab58a125e11/jsondiff you'll see that there are a few options for the 'syntax' of the results - this isn't mentioned in the brief documentation. The default is 'compact' which seems to show just the changed things, as you've found. If you chooose 'explicit' syntax you get instead 'insert', 'update', 'delete' etc. in the diff output. There's another syntax 'symetric' which you could explore.

So change:

c = jsondiff.diff(a,b)

to:

c = jsondiff.diff(a,b,syntax='explicit')

I modified your file2 so it adds and deletes some keys:

b = {
  "Indicator": {
    "key1": "value 1",
    "key3":"value changed",
    "Name": "value 3",
    "key4": "value 4",
    "key5": "added",
    
    "Description": "some text",
    "Subformulas": {
      "some key": "change in some sub-formula"
    },
    "Formula": "some formula"
  }
}

Any the result in the c result of diff is (pprint-ed):

{update: {'Indicator': {insert: {'key3': 'value changed', 'key5': 'added'},
                        delete: ['key2'],
                        update: {'Subformulas': {update: {'some key': 'change '
                                                                      'in some '
                                                                      'sub-formula'}}}}}}

which the xls writing code will have to 'walk' the nested dictionaries to produce output.

A bit of research and I found this code for a generic list/dictionary walker objwalk() https://code.activestate.com/recipes/577982-recursively-walk-python-objects/

I made the choice to flatten the path of keys by joining their names with / - obviously if your keys might include / you'll have to adapt this.

Here's the resulting code as a minimal reproducible example - NOTE when you next create a question here you should take the same approach with the code you put into your question, i.e. provide everything needed to run the code.

import collections.abc
import jsondiff
import pprint

import xlsxwriter

a = {
  "Indicator": {
    "key1": "value 1",
    "key2": "value 2",
    "Name": "value 3",
    "key4": "value 4",
    "Description": "some text",
    "Subformulas": {
      "some key": "some sub-formula"
    },
    "Formula": "some formula"
  }
}

b = {
  "Indicator": {
    "key1": "value 1",
    "key3":"value changed",
    "Name": "value 3",
    "key4": "value 4",
    "key5": "added",
    
    "Description": "some text",
    "Subformulas": {
      "some key": "change in some sub-formula"
    },
    "Formula": "some formula"
  }
}

c = jsondiff.diff(a,b,syntax='explicit')

pprint.pprint( c)

# dual python 2/3 compatability, inspired by the "six" library
string_types = (str, unicode) if str is bytes else (str, bytes)
iteritems = lambda mapping: getattr(mapping, 'iteritems', mapping.items)()
# from https://code.activestate.com/recipes/577982-recursively-walk-python-objects/
def objwalk(obj, path=(), memo=None):
    if memo is None:
        memo = set()
    iterator = None
    if isinstance(obj, collections.abc.Mapping):
        iterator = iteritems
    elif isinstance(obj, (collections.abc.Sequence, collections.abc.Set)) and not isinstance(obj, string_types):
        iterator = enumerate
    if iterator:
        if id(obj) not in memo:
            memo.add(id(obj))
            for path_component, value in iterator(obj):
                for result in objwalk(value, path + (path_component,), memo):
                    yield result
            memo.remove(id(obj))
    else:
        yield path, obj
       
# from https://stackoverflow.com/a/5389547/2318649
def grouped(iterable, n):
    "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
    return zip(*[iter(iterable)]*n)

workbook = xlsxwriter.Workbook('myfile.xlsx')
worksheet = workbook.add_worksheet()

row = 0
# write the column headings
worksheet.write(row, 0, "Path" )
worksheet.write(row, 1, "Change" )
worksheet.write(row, 2, "New Value")

for o in objwalk(c):
    print( f"{o=}" )
    keypath = []
    for op,key in grouped(o[0],2):
        if type(key)==str:
            keypath.append(str(key))
#        print( f"  {op=}" )
        print( f"  {op=} {key=}" )
        lastop = op
    print( f"{keypath=} {lastop=} {o[1]=}" )
    row += 1
    # write the values for this row
    worksheet.write(row, 0, "/".join(keypath) )
    worksheet.write(row, 1, repr(lastop))
    worksheet.write(row, 2, o[1])
    
workbook.close()

Output:

The 'delete 'row looks odd with the deleted key showing up in the 'New Value' column, but that's the way jsondiff produces its output for explicit syntax. You can of course process it differently.

Btw this solution is working great and does the job perfectly. There are just this warning coming up DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working if isinstance(obj, collections.Mapping):. I have updated all the libraries but still this is coming up. — Master, Oct 11 '21 at 18:27
and from next time i'll put all of the required code....thanks for pointing that out — Master, Oct 11 '21 at 18:29
change all `collections` to `collections.abc` - I've edited the code in my answer — DisappointedByUnaccountableMod, Oct 11 '21 at 18:32

Is there a good, controllable way to compare two .json files and generate their difference into an Excel Sheet in Python

1 Answers1