2

Given the input document:

{"a":1}
{"b":2}
{"c":3,"d":4}

What is the difference between the following jq programs (if any)? They all seem to produce the same output.

  1. jq '[., inputs] | map(to_entries[].value)'
  2. jq -n '[inputs] | map(to_entries[].value)'
  3. jq -s 'map(to_entries[].value)'

In other words, the following (simplified/reduced) invocations seem identical:

  • jq '[.,inputs]'
  • jq -n '[inputs]'
  • jq -s '.'.

How are they different? Are there scenarios where one works, but the others don't? Did older versions of jq not support all of them? Is it performance related? Or simply a matter of readability and personal preference?


Bonus points (added later to the question): does the same hold true for the following programs?

  1. jq '., inputs | to_entries[].value'
  2. jq -n 'inputs | to_entries[].value'
  3. jq -s '.[] | to_entries[].value'
  4. jq 'to_entries[].value'
knittl
  • 246,190
  • 53
  • 318
  • 364

3 Answers3

3

With jq '-n [inputs] ....' and jq '[.,inputs] ....', you are loading the whole file into memory.

A more memory-efficient way to achieve the result as an array is:

jq -n '[inputs | to_entries[].value]'
peak
  • 105,803
  • 17
  • 152
  • 177
Philippe
  • 20,025
  • 2
  • 23
  • 32
2

Those first three programs are equivalent, both functionally and in terms of resource utilization, but they obscure the difference between array-oriented and stream-oriented programming.

In a nutshell, think sed and awk. For more details, see e.g. my A Stream-oriented Introduction to jq, and i.p. the section On the importance of inputs.


Bonus points: does the same hold true for the following programs:

Referring to the last four numbered examples in the Q: (4), (5) and (7) are essentially equivalent; (6) is just silly.


If you're looking for a reason why all these variations exist, please bear in mind that input and inputs were late additions in the development of jq. Perhaps they were late additions because jq was originally envisioned as a very simple and for the most part "purely functional" language.

knittl
  • 246,190
  • 53
  • 318
  • 364
peak
  • 105,803
  • 17
  • 152
  • 177
  • Thanks for your answer. I am a little confused though, quoting the linked article: »Prior to the availability of inputs, it was necessary to “slurp” the entire file (as by using the -s command-line option), _which is inefficient, at least in the use of memory_.« I assume this only applies when using `reduce`, but I think this answer should mention it ("it" being `inputs` not requiring the whole file to be kept in memory for some cases). – knittl Sep 25 '22 at 14:01
  • To avoid misunderstandings: I see that `[inputs]` still loads everything, but `jq -n 'inputs | …'` is preferrable over `jq -s '.[] | …'` – knittl Sep 25 '22 at 18:22
  • @knittl - You wrote "I assume this only applies when using reduce", but as your subsequent comment (immediately above) illustrates, that is not the case. – peak Sep 25 '22 at 19:32
  • Yes, I learned something in the 4 hours between those two comments :) I've also edited my question to include more details for future visitors who are wondering the same. – knittl Sep 25 '22 at 19:44
  • @knittl - In future, when editing a question, please find a way to do so without making answers to the original question problematic or confusing. Having two sets of distinct programs, both labeled "(1), (2), (3)", is probably not a good idea anyway.... – peak Sep 25 '22 at 21:08
  • @knittl - No, I've fixed the numbering scheme, and updated my response accordingly. Maybe we should delete our now-superfluous comments? – peak Sep 26 '22 at 05:51
  • @knittl - Yes, perhaps there is a better approach, but at least it's not confusing. – peak Sep 26 '22 at 06:02
  • Turns out markdown (at least on SO) can start lists at a different number. I've updated the Q and your A accordingly – knittl Sep 26 '22 at 06:04
1

Adding even more cases for the sake of completeness:

From the manual:

--raw-input/-R:

Don't parse the input as JSON. Instead, each line of text is passed to the filter as a string. If combined with --slurp, then the entire input is passed to the filter as a single long string.

This means that on one hand

  • jq -R -n '[inputs]' and
  • jq -R '[., inputs]'

both produce an array of strings, as each item provided by inputs (and . if it wasn't silenced by -n) corresponds to a line of text from the input document(s), whereas on the other hand

  • jq -R -s '.'

slurps all characters from the input document(s) into exactly one long string, newlines included.

pmf
  • 24,478
  • 2
  • 22
  • 31