I have a large JSON file that I am streaming with jq
.
This can be used as a test file:
{
"a": "some",
"b": [
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
},
{
"d": "some"
}
]
}
I am trying to save separate files once a defined number of lines has been provided in STDIN. Multiple answers (
How can I split one text file into multiple *.txt files?,
How can I split a large text file into smaller files with an equal number of lines?,
Split a JSON array into multiple files using command line tools)
suggest the use of split
piped to the initial command.
jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json
This works, however, based on my knowledge of the |
in unix, it takes the output of the first command and sends it to the second so STDIN will contain all of the lines (making the stream useless, although STDIN will likely not go out of memory as it can be saved on disk).
I have read that xargs
can send a predefined number of lines to a command, so I tried this:
jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | xargs -I -l5 split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json
However, no output is generate, plus the |
is still there so I am assuming I would get the same behavior. In addition, I believe split will overwrite the previously created files as it would be a new invocation.
Does anyone have any advice? Am I missing something in my unix
terminal knowledge?
(This question How to 'grep' a continuous stream? lists how to grep a continuous stream using the --line-buffered
approach, is there an equivalent for split?)