0

On input consider db-dump(from dbeaver), having this format:

{
  "select": [
    {<row1>},
    {<row2>}
   ],
  "select": {}
}

say that I'm debugging bigger script, and just want to see first few rows, from first statement. How to do that effectively in rather huge file?

Template:

jq 'keys[0] as $k|.[$k]|limit(1;.[])' dump

isn't really great, as it need to fetch all keys first. Template

jq '.[0]|limit(1;.[])' dump

sadly does not seem to be valid one, and

jq 'first(.[])|limit(1;.[])' dump

does not seem to have any performance benefit.

What would be the best way to just access first field in object without actually testing it's name or caring for rest of fields?

Martin Mucha
  • 2,385
  • 1
  • 29
  • 49

2 Answers2

0

Given that weird object with identical keys, you can use the --stream option to access all items before the JSON processor would eliminate the duplicates, fromstream and truncate_stream to dissect the input, and limit to reduce the output to just a few items:

jq --stream -cn 'limit(5; fromstream(2|truncate_stream(inputs)))' dump.json
{<row1>}
{<row2>}
{<row3>}
{<row4>}
{<row5>}
pmf
  • 24,478
  • 2
  • 22
  • 31
  • ok, maybe lets start with streaming; I always avoid it, since it's documentation is rather sparse. Can you advice what this from documentation could mean:" However, streaming isn't easy to deal with as the jq program will have [, ] (and a few other forms) as inputs." —— what are the possible forms to expect? Is it explained somewhere? – Martin Mucha Dec 03 '22 at 12:03
  • @MartinMucha `--stream` breaks down the input by outputting all the states while traversing the input. Thus, when reaching a new scalar it produces an array with two items: its path (which is also an array) and its value (the scalar). When reaching the end of a level (called backtracking), it produces an array with one item: the current path (again, as array). `2|truncate_stream` lets you filter them "2 levels deep" into the path (representing `."select"` and `.[]`), and `fromstream` just puts that (filtered) stream together again (giving you objects instead of scalars). – pmf Dec 03 '22 at 12:17
0

One strategy would be to use the —stream command-line option. It’s a bit tricky to use, but if you want to use jq or gojq, it’s the way to go for a space-time efficient solution for a large input.

Far easier to use would be my jm script, which is intended precisely to achieve the kind of objective you describe. In particular, please note its —-limit option. E.g. you could start with:

jm -s —-limit 1

See

https://github.com/pkoppstein/jm

How to read a 100+GB file with jq without running out of memory

peak
  • 105,803
  • 17
  • 152
  • 177