My script processes ~30 lines per second and uses just one CPU core.
while read -r line; do echo "$line" | jq -c '{some-tansfomration-logic}'; done < input.json >> output.json
The input.json
is ~6GB 17M lines file. It's a new-line delimited json, not an array.
I have 16 (or more, if makes sense) cores (vCPUs on GCP) and want to run this process in parallel. I know, hadoop is the way to go. But it's a one-time thing, how do I speed up the process to ~600 lines per second simply?
Lines ordering is not important.