1

I have a bit of a problem. I've tried so many different iterations of jq -r against a large set of json files being spit out by an object detector but I just can't make it happy enough to give me a csv file. I simplified these massive files into a very short example below with the exact same data structure (note, the names of things like "book", "person", "animal" are ever changing among the scripts, so if there is a way to do this without hardcoding the keys in the command, that would be most desirous but not requiredous)

Example:

{
    "/src/files/image_1.png": {
        "book": 0.01711445301771164,
        "person": 0.000330559065533263624,
        "place": 0.9814764857292175,
        "animal": 1.8662762158783153e-05,
        "vehicle": 0.0010597968939691782
    },
    "/src/files/image_2.png": {
        "book": 0.23741412162780762,
        "person": 0.1587823033328247,
        "place": 0.59659236669504,
        "animal": 0.0036556862760335207,
        "vehicle": 0.003555471543222666
    }
}

Ideally, I'd like to make a csv file whose tabular format would look something like this:

File Book Person Place Animal Vehicle
/src/files/image_1.png 0.01711445301771164 0.000330559065533263624 0.9814764857292175 1.8662762158783153e-05 0.0010597968939691782
/src/files/image_2.png 0.23741412162780762 0.1587823033328247 0.59659236669504 0.0036556862760335207 0.003555471543222666
peak
  • 105,803
  • 17
  • 152
  • 177
David Moore
  • 90
  • 1
  • 8
  • 1
    How is that JSON *semi-malformed*? – oguz ismail Mar 15 '21 at 07:00
  • This example isn't a good example of what I was talking about--I suppose this is a case in which I gave too much information about the problems in my files which distracted from the goal. – David Moore Mar 15 '21 at 09:59

2 Answers2

1
. as $data |
keys_unsorted as $ids |
( .[ $ids[0] ] | keys_unsorted ) as $fields |
(
   [ "id", $fields[] ],
   ( $ids[] | [ ., $data[.][$fields[]] ] )
) | @csv

jqplay

or

keys_unsorted as $ids |
( .[ $ids[0] ] | keys_unsorted ) as $fields |
(
   [ "id", $fields[] ],
   ( to_entries[] | [ .key, .value[$fields[]] ] )
) | @csv

jqplay

Neither handles an empty input ({}) well.


For comparison, @peak's solution with headers would be

keys_unsorted as $ids |
( .[ $ids[0] ] | keys_unsorted ) as $fields |
(
   [ "id", $fields[] ],
   ( $ids[] as $id | [ $id, .[$id][$fields[]] ] )
) | @csv

jqplay

It doesn't handle an empty input ({}) well either

ikegami
  • 367,544
  • 15
  • 269
  • 518
1

The following solution is a bit trickier than it might otherwise be because it is "data-driven" (no key names other than "File" are hard-coded in the jq program) while not assuming that the inner keys are ordered consistently:

  keys_unsorted as $outer
  | (.[$outer[0]] | keys_unsorted) as $inner
  | ["File"] + ($inner|map((.[:1]|ascii_upcase) + .[1:])),
    ($outer[] as $k
     | [$k] + [.[$k] | .[$inner[]]])
  | @tsv

You might of course wish to use @csv instead of @tsv.

(There are many SO questions illustrating how headers can be added, e.g. the following includes an illustration of how to add them without hardcoding them, and with dynamically generated "-" lines beneath the primary headers: How to format a JSON string as a table using jq?)

peak
  • 105,803
  • 17
  • 152
  • 177
  • A table with "randomly"-ordered columns is not to useful without a header row. – ikegami Mar 15 '21 at 07:27
  • This did indeed work far better than my efforts, but ikegami's was a full solution. I will need to do a fair bit of reading to reach a point where I can script as well as you. – David Moore Mar 15 '21 at 09:58
  • 1
    @DavidMoore - Since you want a complete solution, it is now provided, including capitalization of the header names. – peak Mar 15 '21 at 14:59