Using jq to extract multiple json objects

Question

I have been using jq to successfully extract one JSON blob at a time from some relatively large files and write it out to a file of one JSON object per line for further processing. Here is an example of the JSON format:

{
  "date": "2023-07-30",
  "results1":[
    {
      "data": [    
        {"row": [{"key1": "row1", "key2": "row1"}]},
        {"row": [{"key1": "row2", "key2": "row2"}]}
      ]
    },
    {
      "data": [    
        {"row": [{"key1": "row3", "key2": "row3"}]},
        {"row": [{"key1": "row4", "key2": "row4"}]}
      ]
    }
  ],
  "results2":[
    {
      "data": [    
        {"row": [{"key3": "row1", "key4": "row1"}]},
        {"row": [{"key3": "row2", "key4": "row2"}]}
      ]
    },
    {
      "data": [    
        {"row": [{"key3": "row3", "key4": "row3"}]},
        {"row": [{"key3": "row4", "key4": "row4"}]}
      ]
    }
  ]
}

My current approach is to run the following and redirect the stdout to a file:

jq -rc ".results1[]" my_json.json

This works fine, however, it seems like jq reads the entire file into memory in order to extract the chunk I am interested in.

Questions:

Does jq read the entire file into memory when I execute the above statement?
Assuming the answer is yes, is there a way that I can extract results1[] and results2[] on the same call to avoid reading the file twice?

I have used the --stream option but it is very slow. I also read that it sacrifices speed for memory savings, but memory is not an issue at this time so I would prefer to avoid using this option. Basically, what I need is to read in the above json once and output two files in JSON lines format.

Edit: (I changed the input data a bit to show the differences in the output)

Output file 1:

{"data":[{"row":[{"key1":"row1","key2":"row1"}]},{"row":[{"key1":"row2","key2":"row2"}]}]}
{"data":[{"row":[{"key1":"row3","key2":"row3"}]},{"row":[{"key1":"row4","key2":"row4"}]}]}

Output file 2:

{"data":[{"row":[{"key3":"row1","key4":"row1"}]},{"row":[{"key3":"row2","key4":"row2"}]}]}
{"data":[{"row":[{"key3":"row3","key4":"row3"}]},{"row":[{"key3":"row4","key4":"row4"}]}]}

It seems pretty well known that the streaming option is slow. See the discussion here.

My attempt at implementing it followed the answer here.

`I have used the --stream option but it is very slow.` I doubt that. Could you show - how you implemented that with stream? — Inian, Jul 31 '23 at 12:47
Also please post the _exact_ desired output and not leave it to speculation — Inian, Jul 31 '23 at 12:47

glenn jackman · Accepted Answer · 2023-07-31T13:05:02.063

1

jq doesn't have any file IO facilities, so you can't output multiple files.

You can output each piece of data with it's key and post-process it:

jq -r '
    to_entries[]
    | select(.key != "date")
    | .key as $k
    | .value[]
    | [$k, @json]
    | @tsv
' my_json.json

outputs

results1    {"data":[{"row":[{"key1":"row1","key2":"row1"}]},{"row":[{"key1":"row2","key2":"row2"}]}]}
results1    {"data":[{"row":[{"key1":"row3","key2":"row3"}]},{"row":[{"key1":"row4","key2":"row4"}]}]}
results2    {"data":[{"row":[{"key3":"row1","key4":"row1"}]},{"row":[{"key3":"row2","key4":"row2"}]}]}
results2    {"data":[{"row":[{"key3":"row3","key4":"row3"}]},{"row":[{"key3":"row4","key4":"row4"}]}]}

So:

while IFS=$'\t' read -r key json; do
    printf '%s\n' "$json" >> "${key}.jsonl"
done < <(
    jq -r '...' my_json.json
)

or

jq -r '...' my_json.json | awk -F '\t' '{print $2 > ($1 ".jsonl")}'

edited Jul 31 '23 at 13:05

answered Jul 31 '23 at 12:57

glenn jackman

238,783
38
220
352

Using e.g. awk would (a.s.) be more economical, wouldn't it? – peak Jul 31 '23 at 13:00
In a simple case like this? Depends on how many output files there are. – glenn jackman Jul 31 '23 at 13:02
This is helpful, thank you. I forgot to mention that I am on Windows. I have an awk executable but I am unsure of how to translate your above into a Windows command line statement. Can you offer any advice on that? – fsumathguy Jul 31 '23 at 13:41
Nope, sorry. You should ask another question with the appropriate tags. – glenn jackman Jul 31 '23 at 13:43
No problem. Thank you for your help. I tried asking that question a while ago. – fsumathguy Jul 31 '23 at 13:52

score 1 · Answer 2 · answered Jul 31 '23 at 13:25

With Bash ≥ 4, processing bigger chunks could be improved by reading n lines at once using mapfile:

jq -cr '$ARGS.positional[] as $key | .[$key] | $key, length, .[]' input.json \
  --args results1 results2 | while read -r key; read -r len
do mapfile -t -n $len
  printf '%s\n' "${MAPFILE[@]}" > "$key.jsonl"
done

Using jq to extract multiple json objects

2 Answers2