1

Is there a method to obtain a diff for JSON Lines files? In case there's confusion, by "JSON Lines", I mean the format described here, which basically requires that every line is a valid JSON structure. Anyway, there's an answer here that discusses using jq in order to diff two different JSON files.

However, there, the question wanted the diff not to consider within-list ordering whereas I do care about that ordering. In addition, the answers contain jq scripts that just give a true or false response and do not give a full diff. Ideally, I'd like a full diff. There is a project call json-diff that does diff JSON files, but it only works for a single JSON entity, not with JSON lines.

To reiterate, is there a method or something like a jq script that can obtain a diff for JSON lines formatted files?

peak
  • 105,803
  • 17
  • 152
  • 177
wyer33
  • 6,060
  • 4
  • 23
  • 53
  • 1
    If the only limitation for using json-diff is that it operates over actual JSON objects instead of over individual lines, `jq -s .` should wrap them in an array for you. –  Feb 18 '16 at 00:43
  • When you say you'd like a "full diff", what would this entail exactly? A textual diff, as in `json-diff`? –  Feb 18 '16 at 00:44
  • @SantiagoLapresta Yes, I mean a textual diff as in `json-diff`. I actually didn't know about `-s/--slurp` and that may do it. I just ran the test command `json-diff.js <(jq -s . a.jsonl) <(jq -s . b.jsonl)` and that basically does it. If there's a better way, I'd like to hear it. Otherwise, if you add that as an answer, I'll accept it. – wyer33 Feb 18 '16 at 03:21

2 Answers2

2

If I understand the question correctly, the following should do the job. I'll assume you have access to jq 1.5, which includes the filter walk/1 (if that is not the case, it's easy to supplement the file below with the definition, which can be found on the web, e.g. the src/builtin.jq file), and that you have a reasonably modern Mac or Linux-like shell.

(1) Create a file called (let's say) jq-diff.jq with these two lines:

def sortKeys: to_entries | sort | from_entries;
walk( if type == "object" then sortKeys else . end )

(2) Assuming the two files with JSON entities in them are FILE1 and FILE2, then run one of the following commands, depending on whether you want the JSON entities within each file to be sorted:

diff <(jq -cf jq-diff.jq FILE1 | sort) <(jq -cf jq-diff.jq FILE2 | sort)

# OR:

diff <(jq -cf jq-diff.jq FILE1) <(jq -cf jq-diff.jq FILE2)

Brief explanation:

The role of jq here is to sort the keys in the objects (without sorting the arrays) and to print them in a standard way, one per line (courtesy of the -c option).

peak
  • 105,803
  • 17
  • 152
  • 177
1

You can use the -s flag to slurp your newline-separated JSON objects into a JSON array containing them, thus making them eligible for comparison with json-diff.

  • According to the json-diff documentation, json-diff compares `only the json structure (keys), ignoring the values`. – peak Mar 12 '21 at 08:59