3

DeepDiff results look like:

{'dictionary_item_added': [root[5], root[6]], 'dictionary_item_removed': [root[4]]}

For human review of changes, this only works for small examples. I need something like the code file differences displayed in GitHub commits and pull requests but for json.

So here is my question:

How to convert DeepDiff output to something like human readable diff?

Why I don't want to drop DeepDiff and use git-diff

Unlike in code, json does not care about format and json doesn't care about order of keys in dictionaries.

I could get around not using DeepDiff by pre-sorting all dictionaries in json and then comparing them with git-diff. Yet writing files to disk and shelling out to git-diff is messy. Just doing DeepDiff(t1, t2) is very clean.

The example I'm looking at is:

from deepdiff import DeepDiff
t1 = {1:1, 3:3, 4:4}
t2 = {1:1, 3:3, 5:5, 6:6}
ddiff = DeepDiff(t1, t2)
print(ddiff)

Specifics that I'm looking for

I'd like to see words highlighted within values that got changed, like so:

diff with words highlighted With a few differences:

  • This is an example of code but it works for json just as well
  • I only need this for text-based terminals that support ANSI colors
  • I'm looking on how to do this in Python or C++
  • The GitHub screenshot has the idea that I like: show lines with - / + and highlight words within each line
Aleksandr Levchuk
  • 3,751
  • 4
  • 35
  • 47
  • Can you tell us what you have already tried to achieve this? We won't be able to help you without knowing your existing code. – Syed M. Sannan Nov 06 '22 at 13:27
  • This [gist](https://gist.github.com/ines/04b47597eb9d011ade5e77a068389521) may be able to help. – LeoDog896 Nov 06 '22 at 13:33
  • Also, this may be a duplicate of [this](https://stackoverflow.com/questions/32500167/how-to-show-diff-of-two-string-sequences-in-colors) (which also may be where the gist originated from) – LeoDog896 Nov 06 '22 at 13:35
  • I believe this is more of a freelance project than a Stack Overflow question with a bounty. – M. Elghamry Nov 06 '22 at 13:36
  • I ran the DeepDiff example and this: `echo -e '{\n "1": 1,\n "3": 3,\n "4": 4\n}' > /tmp/left; echo -e '{\n "1": 1,\n "3": 3,\n "5": 5\n "6": 6\n}' > /tmp/right; git diff /tmp/left /tmp/right; git diff --color-words=. /tmp/left /tmp/right` yet here I don't like how git-diff displays word differences. The GitHub screenshot has the best idea: show lines with - / + and highlight words within each line. – Aleksandr Levchuk Nov 06 '22 at 13:37
  • @LeoDog896 unlike that question I'm looking to show lines with - / + and highlight words within each line – Aleksandr Levchuk Nov 06 '22 at 13:45

1 Answers1

5

difflib's ndiff may be what you're trying to accomplish:

import difflib
import json
from typing import Callable

t1 = {1:1, 3:3, 4:4}
t2 = {1:1, 3:3, 5:5, 6:6}

RED: Callable[[str], str] = lambda text: f"\u001b[31m{text}\033\u001b[0m"
GREEN: Callable[[str], str] = lambda text: f"\u001b[32m{text}\033\u001b[0m"

def get_edits_string(old: str, new: str) -> str:
    result = ""

    lines = difflib.ndiff(old.splitlines(keepends=True), new.splitlines(keepends=True))
    
    for line in lines:
        line = line.rstrip()
        if line.startswith("+"):
            result += GREEN(line) + "\n"
        elif line.startswith("-"):
            result += RED(line) + "\n"
        elif line.startswith("?"):
            continue
        else:
            result += line + "\n"

    return result

print(
    get_edits_string(
        json.dumps(t1, indent=4, sort_keys=True),
        json.dumps(t2, indent=4, sort_keys=True)
    )
)

enter image description here

The benefit for this can also be helpful in the case of CLIs -- I've filtered it in the code, but it also has color-less diffs with a ? marking where the changes are.

Aleksandr Levchuk
  • 3,751
  • 4
  • 35
  • 47
LeoDog896
  • 3,472
  • 1
  • 15
  • 40
  • It's important to add `sort_keys=True` to json.dumps to not be susceptible to random ordering of keys. This is great and the core of what I needed, Thank you! It would be even more helpful if for larger strings within json if word differences are highlighted. It's like a diff within a diff :) – Aleksandr Levchuk Nov 08 '22 at 19:23
  • Thanks for the revision! It'd be extra user friendly if the diff truncates long runs of unmodified json. Similar to how GitHub, git diff, and hg diff do it. Rational: for large json where only a few places get edited it's time consuming for a human to scroll thru all of it to review each modification. – Aleksandr Levchuk Nov 09 '22 at 14:55
  • 1
    oh that was easy, just had to replace `ndiff` with `unified_diff(a, b, fromfile="before", tofile="after")` – Aleksandr Levchuk Nov 09 '22 at 15:22
  • Were you able to achieve what you were looking for? – LeoDog896 Nov 12 '22 at 14:12
  • 1
    `lines = ''.join(lines)` followed by `lines.splitlines()` this seems strange and sometimes a line will not have a "\n" at the end so this will end up fusing lines. Did you mean to use `.rstrip()` instead? – Aleksandr Levchuk Nov 15 '22 at 13:13
  • also the codes don't reset boldness, i found this to work instead: `RED: Callable[[str], str] = lambda text: f"\u001b[31m{text}\033\u001b[0m"` and `32m` for green – Aleksandr Levchuk Nov 15 '22 at 13:33
  • When lines start with "?" why skip them? – Aleksandr Levchuk Nov 15 '22 at 13:39
  • They didn't fit with the idea you had -- but I would remove them personally since they're better for CLI output. – LeoDog896 Nov 15 '22 at 13:40
  • oh i see, `unified_diff` does not print "?"'s - i switched to `unified_diff` (rational in my Nov 9 comment) – Aleksandr Levchuk Nov 15 '22 at 13:57