Iteratively split output of a stream in stdin

Question

I have a large JSON file that I am streaming with jq.

This can be used as a test file:

{
    "a": "some",
    "b": [
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        }
    ]
}

I am trying to save separate files once a defined number of lines has been provided in STDIN. Multiple answers (

How can I split one text file into multiple *.txt files?,

How can I split a large text file into smaller files with an equal number of lines?,

Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?,

Split a JSON array into multiple files using command line tools)

suggest the use of split piped to the initial command.

jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json

This works, however, based on my knowledge of the | in unix, it takes the output of the first command and sends it to the second so STDIN will contain all of the lines (making the stream useless, although STDIN will likely not go out of memory as it can be saved on disk).

I have read that xargs can send a predefined number of lines to a command, so I tried this:

jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | xargs -I -l5 split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json

However, no output is generate, plus the | is still there so I am assuming I would get the same behavior. In addition, I believe split will overwrite the previously created files as it would be a new invocation.

Does anyone have any advice? Am I missing something in my unix terminal knowledge?

(This question How to 'grep' a continuous stream? lists how to grep a continuous stream using the --line-buffered approach, is there an equivalent for split?)

_"the | in unix, it takes the output of the first command and sends it to the second so STDIN will contain all of the lines"_ No, the commands in a pipe run in parallel; each command might do a little buffering internally for optimizing IO though. — Fravadona, May 20 '22 at 20:24
`fromstream(2 | truncate_stream(inputs | select(.[0][0] == "b")))` is how it's done idiomatically, by the way — oguz ismail, May 20 '22 at 20:28
@Fravadona so the command I have will actually be doing what I want. Thank you! — Guido Muscioni, May 20 '22 at 20:31
Guido Muscioni, if you found a satisfactory answer. Please post it as answer. — Dudi Boy, May 20 '22 at 21:05

score 0 · Answer 1 · answered May 20 '22 at 21:25

As commented by @Fravadona:

"the | in unix, it takes the output of the first command and sends it to the second so STDIN will contain all of the lines"

No, the commands in a pipe run in parallel; each command might do a little buffering internally for optimizing IO though.

So the indicated command has the expected behavior.

Iteratively split output of a stream in stdin

1 Answers1