Having (simplified for learning) input file:
{"type":"a","id":"1"}
{"type":"a","id":"2"}
{"type":"b","id":"1"}
{"type":"c","id":"3"}
I'd like to turn it into:
{
"a": [1,2],
"b": [1],
"c": [3]
}
via using --stream option, not needed here, just for learning. Or at least it does not seem that viable to use group_by or reduce without it on bigger files (even few G seems to be rather slow)
I understand that I can write smth like:
jq --stream -cn 'reduce (inputs|select(length==2)) as $i([]; . + ..... )' test3
but that would just process the data per line(processed item in stream), ie I can either see type or id, and this does not have place where to create pairing. I can cram it to one big array, but that opposite of what I have to do.
How to create such pairings? I don't even know how to create(using --stream):
{"a":1}
{"a":2}
...
I know both (first target transformation, and the one above this paragraph) are probably some trivial usage of for each, I have some working example of one here, but all it's .accumulator and .complete keywords(IIUC) are now just magic. I understood it once, but ... Sorry for trivial questions.
UPDATE regarding performace:
@pmf provided in his answer 2 solutions: streaming and non streaming. Thanks for that, I was able to write non-streaming version, but not the streaming one. But when testing it, the streaming variant was (I'm not 100% sure now, but ...) 2-4 times slower. Makes sense if data does not fit into memory, but luckily in my case, they do. So I ran the non streaming version for ~1G file on laptop, but not actually that slow i7-9850H CPU @ 2.60GHz. For my surprise it wasn't done withing 16hours so I killed it as not viable solution for my usecase of potentially a lot bigger input files. Considering simplicity of input, I decided to write pipeline just via using bash, grep,sed,paste and tr, and eventhough it was using some regexes, and was overally inefficient as hell, and without any parallelism, the whole file was correctly crunched in 55 seconds. I understand that character manipulation is faster than parsing json, but that much difference? Isn't there some better approach while still parsing json? I don't mind spending more cpu power, but if I'm using jq, I'd like to use it's functions and process json as json, not just chars just as I did it with bash.