0

I have a lot of rather large JSON logs which need to be imported into several DB tables. I can easily parse them and create 1 CSV for import. But how can I parse the JSON and get 2 different CSV files as output? Simple (nonsense) example:

testJQ.log

{"id":1234,"type":"A","group":"games"}
{"id":5678,"type":"B","group":"cars"}

using

cat testJQ.log|jq --raw-output '[.id,.type,.group]|@csv'>testJQ.csv

I get one file testJQ.csv

1234,"A","games
5678,"B","cars"

But I would like to get this

types.csv

1234,"A"
5678,"B"

groups.csv

1234,"games"
5678,"cars"

Can this be done without having to parse the JSON twice, first time creating the types.csv and second time the groups.csv like this?

cat testJQ.log|jq --raw-output '[.id,.type]|@csv'>types.csv
cat testJQ.log|jq --raw-output '[.id,.group]|@csv'>groups.csv
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • jq doesn't output to files so you won't be able to do so. You will always need a separate instance if you want to write to different files. – Jeff Mercado May 23 '18 at 22:22

2 Answers2

1

You will either need to run jq twice, or to run jq in conjunction with another program to "split" the output of the call to jq. For example, you could use a pipeline of the form: jq -c ... | awk ...

The potential disadvantage of the pipeline approach is that if JSON is the final output, it will be JSONL; but obviously that doesn't apply here.

There are many ways to craft such a pipeline. For example, assuming there are no raw newlines in the CSV:

< testJQ.log jq -r '
    "types",  ([.id,.type] |@csv),
    "groups", ([.id,.group]|@csv)' |
  awk 'NR % 2 == 1 {out=$1; next} {print >> out".csv"}'

Or:

< testJQ.log jq -r '([.id,.type],[.id,.group])|@csv' |
    awk '{ out = ((NR % 2) == 1) ? "types" : "groups"; print >> out".csv"}'

For other examples, see e.g.

Handling raw new-lines

Whether or not you split the CSV into multiple files, there is a potential issue with embedded raw newlines. One approach is to change "\n" in JSON strings to "\\n", e.g.

jq -r '([.id,.type],[.id,.group])
       | map(if type == "string" then gsub("\n";"\\n") else . end)
       | @csv'
peak
  • 105,803
  • 17
  • 152
  • 177
  • Sounds interesting, but a bit to risky for production use. Would be great if JQ could produce something in the format of --types-- 1234,"A" 5678,"B" --groups-- 1234,"games" 5678,"cars" instead of "--types--" 1234,"A" "--groups--" 1234,"games" "--types--" 5678,"B" "--groups--" 5678,"cars" as a result of `(["--types--"],[.id,.type],["--groups--"],[.id,.group])|@csv` That would make the splitting of the resulting csv a lot safer and faster. I guess I have to live with having to parse the same log multiple times. – Kristian Sköld May 24 '18 at 16:05
1

I suppose one way you could hack this up is to output the contents of one file to stdout and the others to stderr and redirect to separate files. Of course you're limited to two files though.

$ <testJQ.log jq -r '([.id,.type]|@csv),([.id,.group]|@csv|stderr|empty)' \
    1>types.csv 2>groups.csv

stderr outputs to stderr but the value propagates to the output, so you'll want to follow that up with empty to swallow that up.

Personally I wouldn't recommend doing this, I would just write a python script (or other language) to parse this if you needed to output to multiple files.

Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • Currently there is no way to produce "raw output" on stderr, so in this particular scenario, the (stderr|empty) technique won't produce the desired result in groups.csv. – peak May 24 '18 at 01:46
  • Hmm, yeah, for this particular case, raw output wouldn't work. But this might be a viable option if we didn't need raw output (or other output option that doesn't affect stderr). Though, depending on the shell, we could redirect to a process which could say, pass the values as raw output. Like for bash: `... 2> >(jq -r . > groups.csv)` – Jeff Mercado May 24 '18 at 02:12
  • Yes, it's a potentially useful technique, though it's no excuse for jq not to do better. – peak May 24 '18 at 02:45
  • Interesting; `stderr` appears to be supported in 1.5 (if not earlier) but it isn't documented. – chepner May 25 '18 at 14:39