174

Are there any command line utilities that can be used to find if two JSON files are identical with invariance to within-dictionary-key and within-list-element ordering?

Could this be done with jq or some other equivalent tool?

Examples:

These two JSON files are identical

A:

{
  "People": ["John", "Bryan"],
  "City": "Boston",
  "State": "MA"
}

B:

{
  "People": ["Bryan", "John"],
  "State": "MA",
  "City": "Boston"
}

but these two JSON files are different:

A:

{
  "People": ["John", "Bryan", "Carla"],
  "City": "Boston",
  "State": "MA"
}

C:

{
  "People": ["Bryan", "John"],
  "State": "MA",
  "City": "Boston"
}

That would be:

$ some_diff_command A.json B.json

$ some_diff_command A.json C.json
The files are not structurally identical
Jakob
  • 3,570
  • 3
  • 36
  • 49
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

9 Answers9

229

If your shell supports process substitution (Bash-style follows, see docs):

diff <(jq --sort-keys . A.json) <(jq --sort-keys . B.json)

Objects key order will be ignored, but array order will still matter. It is possible to work-around that, if desired, by sorting array values in some other way, or making them set-like (e.g. ["foo", "bar"]{"foo": null, "bar": null}; this will also remove duplicates).

Alternatively, substitute diff for some other comparator, e.g. cmp, colordiff, or vimdiff, depending on your needs. If all you want is a yes or no answer, consider using cmp and passing --compact-output to jq to not format the output for a potential small performance increase.

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
Erik
  • 6,470
  • 5
  • 36
  • 37
  • 1
    Note that this seems to require version 1.5 or later of `jq` – Adam Baxter Aug 18 '16 at 05:31
  • 1
    @voltagex From looking at the online manual (https://stedolan.github.io/jq/manual/v1.4/#Invokingjq) It seems that it was actually added in 1.4, although I don't know if `jq` does posix style arguments so you may have to invoke `jq -c -S ...` – Erik Aug 18 '16 at 14:46
  • 14
    A cleaner, visual form IMO is `vimdiff <(jq -S . a.json) <(jq -S . b.json)` – Ashwin Jayaprakash Dec 08 '16 at 01:25
  • 1
    Yeah, you should remove the `-c` (which makes output compact), style preferences isn't relevant to your answer. – odinho - Velmont Jan 26 '17 at 09:06
  • 1
    @odinho-Velmont @Ashwin Jayaprakash It's true that the `c` isn't strictly necessary, but to me there's no reason for cmp to compare identical whitespace, and no reason for jq to bother emitting it. `diff`, `vimdiff`, or any tool that does file comparison will work, but `cmp` is all that's necessary. – Erik Jan 26 '17 at 17:45
  • @Erik Okay, I see and accept the reasoning with `cmp`. To be helpful for quick copy-paste, you could possibly include a visual diff too (e.g. using `diff`), though :) – odinho - Velmont Jan 26 '17 at 19:13
93

Use jd with the -set option:

No output means no difference.

$ jd -set A.json B.json

Differences are shown as an @ path and + or -.

$ jd -set A.json C.json

@ ["People",{}]
+ "Carla"

The output diffs can also be used as patch files with the -p option.

$ jd -set -o patch A.json C.json; jd -set -p patch B.json

{"City":"Boston","People":["John","Carla","Bryan"],"State":"MA"}

https://github.com/josephburnett/jd#command-line-usage

Joe Burnett
  • 1,089
  • 8
  • 4
  • 9
    So underrated it should be a misdemeanor. Gives an actual `diff` formatting-compatible output. Amazing. – ijoseph Nov 03 '20 at 23:05
  • 5
    You can use the command line tool, or the web tool: http://play.jd-tool.io/ – Joe Burnett Nov 18 '20 at 17:51
  • 3
    This is the holy grail tool for futzing with `json` (and `yaml`, after conversion) configs to see why exactly why one's config is not working compared to someone else's. – ijoseph Nov 18 '20 at 20:07
  • This is exactly what I was looking for. However, there should be an option to turn off and on - order within the list contents should be important or not. – Prashant Sharma Feb 12 '21 at 07:35
  • 1
    That's what the `-set` flag is for. If you include it, order within lists doesn't matter. If you don't, it will matter (default). – Joe Burnett Feb 13 '21 at 09:44
  • 1
    By the way ijoseph, I added a `-yaml` option so `jd` can read and write yaml as well as json. Try it out at http://play.jd-tool.io/ – Joe Burnett Feb 13 '21 at 09:59
  • 1
    I'd like to try this from CLI, but I am a go newbie and can't figure out how to run it. After running `go get` to install it, `jd` is still not found. I downloaded the latest release from github, `chmod +x` to make it runnable, and only got "exec format error: ./jd" when trying to run int from CLI. On OSX if it matters. – jonnybot Mar 30 '21 at 16:25
  • 3
    I was building only for Linux. But since you asked, I've cross-compiled the latest release: https://github.com/josephburnett/jd/releases/tag/v1.4.0. Download jd-amd64-darwin which should work on OSX. – Joe Burnett Apr 02 '21 at 20:33
  • 1
    This is awesome, it is a sin that `jd` isn't more known to the general public. – gented Apr 27 '21 at 23:01
  • 3
    using Homebrew on MacOS: `brew install jd` – Zac Thompson Jun 07 '21 at 17:29
47

Since jq's comparison already compares objects without taking into account key ordering, all that's left is to sort all lists inside the object before comparing them. Assuming your two files are named a.json and b.json, on the latest jq nightly:

jq --argfile a a.json --argfile b b.json -n '($a | (.. | arrays) |= sort) as $a | ($b | (.. | arrays) |= sort) as $b | $a == $b'

This program should return "true" or "false" depending on whether or not the objects are equal using the definition of equality you ask for.

EDIT: The (.. | arrays) |= sort construct doesn't actually work as expected on some edge cases. This GitHub issue explains why and provides some alternatives, such as:

def post_recurse(f): def r: (f | select(. != null) | r), .; r; def post_recurse: post_recurse(.[]?); (post_recurse | arrays) |= sort

Applied to the jq invocation above:

jq --argfile a a.json --argfile b b.json -n 'def post_recurse(f): def r: (f | select(. != null) | r), .; r; def post_recurse: post_recurse(.[]?); ($a | (post_recurse | arrays) |= sort) as $a | ($b | (post_recurse | arrays) |= sort) as $b | $a == $b'
UrsinusTheStrong
  • 1,239
  • 1
  • 16
  • 33
12

Pulling in the best from the top two answers to get a jq based json diff:

diff \
  <(jq -S 'def post_recurse(f): def r: (f | select(. != null) | r), .; r; def post_recurse: post_recurse(.[]?); (. | (post_recurse | arrays) |= sort)' "$original_json") \
  <(jq -S 'def post_recurse(f): def r: (f | select(. != null) | r), .; r; def post_recurse: post_recurse(.[]?); (. | (post_recurse | arrays) |= sort)' "$changed_json")

This takes the elegant array sorting solution from https://stackoverflow.com/a/31933234/538507 (which allows us to treat arrays as sets) and the clean bash redirection into diff from https://stackoverflow.com/a/37175540/538507 This addresses the case where you want a diff of two json files and the order of the array contents is not relevant.

Andrew
  • 1,027
  • 1
  • 11
  • 17
8

Here is a solution using the generic function walk/1:

# Apply f to composite entities recursively, and to atoms
def walk(f):
  . as $in
  | if type == "object" then
      reduce keys[] as $key
        ( {}; . + { ($key):  ($in[$key] | walk(f)) } ) | f
  elif type == "array" then map( walk(f) ) | f
  else f
  end;

def normalize: walk(if type == "array" then sort else . end);

# Test whether the input and argument are equivalent
# in the sense that ordering within lists is immaterial:
def equiv(x): normalize == (x | normalize);

Example:

{"a":[1,2,[3,4]]} | equiv( {"a": [[4,3], 2,1]} )

produces:

true

And wrapped up as a bash script:

#!/bin/bash

JQ=/usr/local/bin/jq
BN=$(basename $0)

function help {
  cat <<EOF

Syntax: $0 file1 file2

The two files are assumed each to contain one JSON entity.  This
script reports whether the two entities are equivalent in the sense
that their normalized values are equal, where normalization of all
component arrays is achieved by recursively sorting them, innermost first.

This script assumes that the jq of interest is $JQ if it exists and
otherwise that it is on the PATH.

EOF
  exit
}

if [ ! -x "$JQ" ] ; then JQ=jq ; fi

function die     { echo "$BN: $@" >&2 ; exit 1 ; }

if [ $# != 2 -o "$1" = -h  -o "$1" = --help ] ; then help ; exit ; fi

test -f "$1" || die "unable to find $1"
test -f "$2" || die "unable to find $2"

$JQ -r -n --argfile A "$1" --argfile B "$2" -f <(cat<<"EOF"
# Apply f to composite entities recursively, and to atoms
def walk(f):
  . as $in
  | if type == "object" then
      reduce keys[] as $key
        ( {}; . + { ($key):  ($in[$key] | walk(f)) } ) | f
  elif type == "array" then map( walk(f) ) | f
  else f
  end;

def normalize: walk(if type == "array" then sort else . end);

# Test whether the input and argument are equivalent
# in the sense that ordering within lists is immaterial:
def equiv(x): normalize == (x | normalize);

if $A | equiv($B) then empty else "\($A) is not equivalent to \($B)" end

EOF
)

POSTSCRIPT: walk/1 is a built-in in versions of jq > 1.5, and can therefore be omitted if your jq includes it, but there is no harm in including it redundantly in a jq script.

POST-POSTSCRIPT: The builtin version of walk has recently been changed so that it no longer sorts the keys within an object. Specifically, it uses keys_unsorted. For the task at hand, the version using keys should be used.

peak
  • 105,803
  • 17
  • 152
  • 177
  • 1
    Thank you for mentioning that `walk` was added in jq 1.5. I have been wishing for a compromise operator between `filter` and `map` and it looks like this is it. – Noah Sussman Aug 17 '17 at 16:26
6

There's an answer for this here that would be useful.

Essentially you can use the Git diff functionality (even for non-Git tracked files) which also includes colour in the output:

git diff --no-index payload_1.json payload_2.json

Maikon
  • 1,382
  • 16
  • 16
  • 6
    This is sensitive to order, which the OP wanted to ignore – Andreas Dec 06 '19 at 14:36
  • This is only a colorized text diff (at least in git version 2.30.2), it doesn't understand JSON semantics. You can check this yourself by producing minimized copies of your JSON data (`jq -cSM`) and git diff'ing them. – ppar Jun 02 '23 at 18:07
2

One more tool for those to which the previous answers are not a good fit, you can try jdd.

It's HTML based so you can either use it online at www.jsondiff.com or, if you prefer running it locally, just download the project and open the index.html.

Acapulco
  • 3,373
  • 8
  • 38
  • 51
1

Perhaps you could use this sort and diff tool: http://novicelab.org/jsonsortdiff/ which first sorts the objects semantically and then compares it. It is based on https://www.npmjs.com/package/jsonabc

Shivraj
  • 21
  • 2
0

In JSONiq, you can simply use the deep-equal function:

deep-equal(
  {
    "People": ["John", "Bryan", "Carla"],
    "City": "Boston",
    "State": "MA"
  },
  {
    "People": ["Bryan", "John"],
    "State": "MA",
    "City": "Boston"
  }
)

which returns

false

You can also read from files (locally or an HTTP URL also works) like so:

deep-equal(
  json-doc("path to doc A.json"),
  json-doc("path to doc B.json")
)

A possible implementation is RumbleDB.

However, you need to be aware that it is not quite correct that the first two documents are the same: JSON defines arrays as ordered lists of values.

["Bryan", "John"]

is not the same as:

["John", "Bryan"]
Ghislain Fourny
  • 6,971
  • 1
  • 30
  • 37