I have a very large file (20GB+ compressed) called input.json
containing a stream of JSON objects as follows:
{
"timestamp": "12345",
"name": "Some name",
"type": "typea"
}
{
"timestamp": "12345",
"name": "Some name",
"type": "typea"
}
{
"timestamp": "12345",
"name": "Some name",
"type": "typeb"
}
I want to split this file into files dependent on their type
property: typea.json
, typeb.json
etc., each containing their own stream of json objects that only have the matching type property.
I've managed to solve this problem for smaller files, however with such a large file I run out of memory on my AWS instance. As I wish to keep memory usage down, I understand I need to use --stream
but I'm struggling to see how I can achieve this.
cat input.json | jq -c --stream 'select(.[0][0]=="type") | .[1]'
will return me the values of each of the type properties, but how do I use this to then filter the objects?
Any help would be greatly appreciated!