0

I am trying to export a very large JSON file (zip compressed: full_ip-port.json) to CSV.

The file consists of a single JSON object that has a regular structure. Here is a representative extract (obtained by [to_entries[0,1]] | from_entries):

{
  "1122595": [
    {
      "ioc_value": "103.100.159.212:443",
      "ioc_type": "ip:port",
      "threat_type": "botnet_cc",
      "malware": "win.cobalt_strike",
      "malware_alias": "Agentemis,BEACON,CobaltStrike,cobeacon",
      "malware_printable": "Cobalt Strike",
      "first_seen_utc": "2023-05-27 02:31:39",
      "last_seen_utc": null,
      "confidence_level": 100,
      "reference": null,
      "tags": "CLOUDIE-AS-AP Cloudie Limited,CobaltStrike,cs-watermark-666666",
      "anonymous": "0",
      "reporter": "drb_ra"
    }
  ],
  "1122593": [
    {
      "ioc_value": "167.172.72.193:23",
      "ioc_type": "ip:port",
      "threat_type": "botnet_cc",
      "malware": "elf.bashlite",
      "malware_alias": "gayfgt,Gafgyt,qbot,torlus,lizkebab",
      "malware_printable": "Bashlite",
      "first_seen_utc": "2023-05-27 01:40:04",
      "last_seen_utc": null,
      "confidence_level": 75,
      "reference": "https://bazaar.abuse.ch/sample/6c901ba15327da68159712a9726807fb08868309726c55ef202818bfde22a5a7/",
      "tags": "Gafgyt",
      "anonymous": "0",
      "reporter": "abuse_ch"
    }
  ]
}

What I have so far:

Inner Part:

jq -r '.[] | .[] | to_entries | map(.value) | @csv' full_ip-port.json

Missing ID:

jq -r '{id: (. | keys[])} | to_entries | map(.value) | @csv' full_ip-port.json

How can I bring it together to a nice and clean CSV file?

peak
  • 105,803
  • 17
  • 152
  • 177
user3022917
  • 579
  • 2
  • 8
  • 20
  • 1
    Please [edit] your question to provide a [mre] (minimal!). Nobody likes to download files from external sites from sites with questionable domain names. – knittl May 27 '23 at 06:58
  • Also, might already be answered here: https://stackoverflow.com/questions/32960857/how-to-convert-arbitrary-simple-json-to-csv-using-jq – knittl May 27 '23 at 06:59

1 Answers1

1

Since your example is sufficiently uniform, you could get away with:

jq -r '
  to_entries
  | (.[0].value[0]|keys_unsorted) as $keys
  | .[]
  | .key as $key
  | .value[]
  | [$key, .[$keys[]] ]
  | @csv
'

With your input, the first row would be:

"1122595","103.100.159.212:443","ip:port","botnet_cc","win.cobalt_strike","Agentemis,BEACON,CobaltStrike,cobeacon","Cobalt Strike","2023-05-27 02:31:39",,100,,"CLOUDIE-AS-AP Cloudie Limited,CobaltStrike,cs-watermark-666666","0","drb_ra"

You could also use ["key"] + $keys (for example) to produce a CVS header:

to_entries
| (.[0].value[0]|keys_unsorted) as $keys
| ["key"] + $keys,
  (.[]
   | .key as $key
   | .value[]
  | [$key, .[$keys[]] ])
| @csv
peak
  • 105,803
  • 17
  • 152
  • 177
  • Having the header too would be nice. May you share the full command? Thx – user3022917 May 27 '23 at 08:18
  • I simply used: echo '"ioc_id","ioc_value","ioc_typ","threat_type","malware","malware_alias","malware_printable","first_seen_utc","last_seen_utc","confidence_level","reference","tags","anonymous","reporter"' – user3022917 May 27 '23 at 08:29