2

Is there some accepted 'best practice' of generating JSON documents using bash and jq? I have a script to gather various data, and to make it easier to further process using other tools I'd like to output the data in JSON format. So I'm using jq to make sure all the quoting etc. gets done correctly, as recommended in this answer: https://stackoverflow.com/a/48470227/75652. However, I'm struggling with how to generate it piecemeal instead of one giant jq call at the end. E.g. something like


read foo <<<$(</path/to/some/oneliner/file)
jq -n --arg f $foo '{foo: $f}'

bar=$(some_command)
jq -n --arg b $bar '{bar: $b}'

Will generate two separate objects (which can be processed with tools that support various more or less informal "JSON streaming" formats, including jq) whereas I'd want a single object, something like


{ "foo": SOMETHING, "bar": SOMETHING_ELSE }

but I can't do that with multiple jq calls as jq will complain that the incomplete JSON is malformed.

And to further add some complexity, in some cases I need to generate nested JSON structures. In another language like python I'd just put all the data in a set of nested dictionaries and then dump it to JSON in the end, but nested dictionaries in bash seem very tedious..

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
janneb
  • 36,249
  • 2
  • 81
  • 97
  • And to further add some complexity, in some cases I need to generate nested JSON structures. What is an example for that? – Inian Oct 01 '21 at 18:52
  • IMO shell is not a good choice for dealing with any serialized data formats. Yes we have tools like `jq` and `xmllint`, but they are cumbersome to use outside of very basic situations. – jordanm Oct 01 '21 at 18:53
  • 1
    `jq -n --arg f "$foo" --arg b "$bar" '{foo: $f, bar: $b}'`? – Shawn Oct 01 '21 at 19:30
  • *nested dictionaries in bash seem very tedious* Especially since bash only has one level associative arrays. – Shawn Oct 01 '21 at 19:32
  • http://shellcheck.net/ is always a good first stop before coming here; it would have pointed out the quoting bugs. – Charles Duffy Oct 01 '21 at 19:40
  • 2
    "Piecemeal instead of one giant call at the end" is generally counterproductive. The more distinct external processes you call the more overhead you're eating in spinning up those processes. More efficient to minimize their number, or -- ideally -- have only one. – Charles Duffy Oct 01 '21 at 19:45
  • 1
    (The same thing is also true of embedding awk in bash: If you can move a whole loop into awk instead of having bash call awk a separate time each time it goes through a loop, that's _vastly_ faster, typically several orders of magnitude; same thing is true for jq). – Charles Duffy Oct 01 '21 at 20:00
  • @Charles-Duffy : shellcheck is indeed an excellent tool that I frequently use. In this case however I was referring to quoting in json, which is why the --arg dance with jq is preferred to just echoing json directly from bash. – janneb Oct 01 '21 at 20:25
  • 1
    @janneb The bug he's referring to is not quoting the expansions of `$foo` and `$bar` in your `jq` invocations. See my earlier comment for the correct way. – Shawn Oct 01 '21 at 20:56

3 Answers3

1

When reaching some complexity (or when I need to externally process some of the data between transformations) I typically end up using something along the lines of

jq --slurpfile foo <(
  
  # first inner shell script
  read foo <<<$(</path/to/some/oneliner/file)
  jq -n --arg f $foo '{foo: $f}'

) --slurpfile bar <(

  # second inner shell script
  bar=$(some_command)
  jq -n --arg b $bar '{bar: $b}'

) -n '$foo[0] + $bar[0]'

That way, the outermost jq call may still have a 'real' input on its own, and the inner calls are fairly maintainable with all bash variables in scope.

pmf
  • 24,478
  • 2
  • 22
  • 31
1

You can save and process intermediary JSON for the next jq command:

#!/usr/bin/env bash

read -r foo <a.txt

json="$(jq -n --arg f "$foo" '{foo: $f}')"


bar="$(pwd)"
jq -n --arg b "$bar" "$json"'+{bar: $b}'

# or alternatively
jq --arg b "$bar" '.bar=$b' <<<"$json"
Léa Gris
  • 17,497
  • 4
  • 32
  • 41
1

The Q makes it seem that $foo and $bar can be pre-computed, in which case you can use as a model:

jq -n --arg f "$foo" --arg b "$bar" '.foo = $f | .bar = $b' 

Of course if the value of $foo is very large, it would be better to make those values available to jq using a file-oriented command-line option, such as --slurpfile.

If the computation of some of the values depends on very large files, then invoking jq several times might make sense. In that case, making N calls to jq to marshal the values, then making one extra call to assemble them into a single JSON object (perhaps using 'jq -s add') seems very reasonable.

An alternative along the lines suggested in the title of the Q would be to create a pipeline of calls to jq, e.g.:

  jq -n --argfile f <(some nasty stuff) '.foo = $f' |
    jq  --argfile b <(some more nasty stuff) '.bar = $b' | ...

Finally, if $bar depends on $foo in some way, then if that dependence can be expressed in a jq program, you could read in the underlying values in one invocation of jq, using a more complex jq program.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
peak
  • 105,803
  • 17
  • 152
  • 177