I have a very big jsonl file (several million lines).
I want to sort this file on a given value, but I don't want to load it entirely in RAM.
Would you have a solution to suggest ?
I had a look at jq
with a sort_by
option, but I think the file is not streamed.
Extra note :
- The order among a group does not matter
- Having as many outputs as username is also good to me, if the method requires splitting the file.
Example :
Here is a dummy example of what my input file looks like :
{"username": "user1", "email": "email1", "value": "10"}
{"username": "user2", "email": "email2", "value": "30"}
{"username": "user2", "email": "email2", "value": "30"}
{"username": "user1", "email": "email1", "value": "5"}
{"username": "user3", "email": "email3", "value": "15"}
{"username": "user1", "email": "email1", "value": "40"}
{"username": "user3", "email": "email1", "value": "40"}
Here is the output I would like :
{"username": "user1", "email": "email1", "value": "10"}
{"username": "user1", "email": "email1", "value": "5"}
{"username": "user1", "email": "email1", "value": "40"}
{"username": "user2", "email": "email2", "value": "30"}
{"username": "user2", "email": "email2", "value": "30"}
{"username": "user3", "email": "email3", "value": "15"}
{"username": "user3", "email": "email1", "value": "40"}