1

Trying to convert the json data below into csv, using jq and or awk or python or perl or anything from Linux shell. Will appreciate your scripting help here.

{
    "inventory": [
        {
          "profile": "Earth",
          "invState": [
            {
              "count": 6,
              "Status": "ONLINE"
            },
            {
              "count": 8,
              "Status": "EXIST"
            },
            {
              "count": 1,
              "Status": "GIVEN"
            },
            {
              "count": 4,
              "Status": "ERROR"
            },
            {
              "count": 49,
              "Status": "INSTOCK"
            },
            {
              "count": 389,
              "Status": "RELEASED"
            },
            {
              "count": 68,
              "Status": "DELETED"
            },
            {
              "count": 280,
              "Status": "CONNECTED"
            },
            {
              "count": 1,
              "Status": "UNINSTOCK"
            }
          ]
        },
        {
          "profile": "Mars",
          "invState": [
            {
              "count": 7,
              "Status": "EXIST"
            },
            {
              "count": 20,
              "Status": "INSTOCK"
            },
            {
              "count": 110,
              "Status": "RELEASED"
            },
            {
              "count": 16,
              "Status": "DELETED"
            },
            {
              "count": 41,
              "Status": "CONNECTED"
            },
            {
              "count": 1,
              "Status": "UNINSTOCK"
            }
          ]
        },
        {
          "profile": "Mercury",
          "invState": [
            {
              "count": 4,
              "Status": "EXIST"
            },
            {
              "count": 1224,
              "Status": "INSTOCK"
            },
            {
              "count": 3,
              "Status": "CONNECTED"
            },
            {
              "count": 18,
              "Status": "RELEASED"
            },
            {
              "count": 5,
              "Status": "DELETED"
            }
          ]
        }
       ]
}

The csv output should look like this:

Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
etc

Please see the attached image

csv output etc

Will appreciate any advice here. I have tried using jq and awk but not getting the right result.

peak
  • 105,803
  • 17
  • 152
  • 177
Oluvictor
  • 7
  • 5
  • 2
    jq syntax is not straightforward. should be much easier to learn, e.g., python for a while and use python's json module. – pynexj Sep 17 '20 at 05:37

6 Answers6

4

Here is one using GNU awk's JSON extension:

$ gawk '
@load "json"
BEGIN {
    OFS=","
}
{
    lines=lines $0                           # keep appending lines
    if(json_fromJSON(lines,data)!=0) {       # until you have a valid object
        for(inventory in data["inventory"])  # then we iterate the arrays and output
            for(invState in data["inventory"][inventory]["invState"])
            print data["inventory"][inventory]["profile"],
                data["inventory"][inventory]["invState"][invState]["count"],
                data["inventory"][inventory]["invState"][invState]["Status"]
        lines=""                             # reset the object array for next round
    }
}' file.json

Parts of the output:

Earth,6,ONLINE
Earth,8,EXIST
...
Mars,7,EXIST
Mars,20,INSTOCK
...
Mercury,4,EXIST
Mercury,1224,INSTOCK
...
James Brown
  • 36,089
  • 7
  • 43
  • 59
3

The following will produce the output as shown if jq is invoked with the -r option:

.inventory[]
| .profile as $profile
| .invState[]
| [$profile] + [.count, .Status]
| join(",")

Note, however, that if CSV output is desired, then it might be better to replace the join in the last line by @csv.

Shorter versions

The following is equivalent to the above:

.inventory[]
| [.profile] + (.invState[] | [.count, .Status])
| join(",")

If the order of the "count" and "Status" keys is fixed, you could get away with:

.inventory[] 
| [.profile] + (.invState[] | [.[]])
| join(",")
peak
  • 105,803
  • 17
  • 152
  • 177
  • I like this solution provided by @peak because it is neat and compact. However I find awk (i.e. answer provided by @Luuk) more forgiving wrt respect to the json syntax. jq will break if on extract bracket is found in the json file, whereas good ole awk works like a breeze. – Oluvictor Sep 18 '20 at 16:48
2

I produced quite a few CSV files out of JSON files with jq. I find jq quite suited for this.

Breaking out of an array

From:

["Earth","Mars","Mercury"]

To:

"Earth"
"Mars"
"Mercury"

Is achieved with this filter: .[] which iterates over the array. As the documentation puts it:

Running .[] with the input [1,2,3] will produce the numbers as three separate results, rather than as a single array.

From string to text

From:

["Earth","Mars","Mercury"]

To:

Earth
Mars
Mercury

Is achieved with the --raw-output parameter on the CLI combined with the .[] filter. e.g.,

jq --raw-output '.[]' input.json

Saving to variables

You'll need to hold a reference to .profile while you process the rest of the inventory. Here's a contrived example:

From:

[ {"x": "Earth", "y": ["1", "2", "3"]}
, {"x": "Mars", "y": ["1", "2", "3"]}
]

To:

"Earth1"
"Earth2"
"Earth3"
"Mars1"
"Mars2"
"Mars3"

Is achieved with this filter: .[] | .x as $x | .y[] | $x + .

(You save .x into var $x that you can refer to in your filter.)


To answer your question, given this input.json:

{
  "inventory": [
    {
      "profile": "Earth",
      "invState": [
        {
          "count": 6,
          "Status": "ONLINE"
        },
        {
          "count": 8,
          "Status": "EXIST"
        },
        {
          "count": 1,
          "Status": "GIVEN"
        },
        {
          "count": 4,
          "Status": "ERROR"
        },
        {
          "count": 49,
          "Status": "INSTOCK"
        },
        {
          "count": 389,
          "Status": "RELEASED"
        },
        {
          "count": 68,
          "Status": "DELETED"
        },
        {
          "count": 280,
          "Status": "CONNECTED"
        },
        {
          "count": 1,
          "Status": "UNINSTOCK"
        }
      ]
    },
    {
      "profile": "Mars",
      "invState": [
        {
          "count": 7,
          "Status": "EXIST"
        },
        {
          "count": 20,
          "Status": "INSTOCK"
        },
        {
          "count": 110,
          "Status": "RELEASED"
        },
        {
          "count": 16,
          "Status": "DELETED"
        },
        {
          "count": 41,
          "Status": "CONNECTED"
        },
        {
          "count": 1,
          "Status": "UNINSTOCK"
        }
      ]
    },
    {
      "profile": "Mercury",
      "invState": [
        {
          "count": 4,
          "Status": "EXIST"
        },
        {
          "count": 1224,
          "Status": "INSTOCK"
        },
        {
          "count": 3,
          "Status": "CONNECTED"
        },
        {
          "count": 18,
          "Status": "RELEASED"
        },
        {
          "count": 5,
          "Status": "DELETED"
        }
      ]
    }
  ]
}

The following invokation of jq should do the trick:

jq --raw-output '.inventory[] | .profile as $p | .invState[] | "\($p),\(.count),\(.Status)"' input.json
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED
customcommander
  • 17,580
  • 5
  • 58
  • 84
2

If you don't have jq or gawks json extension (requires gawkextlib) and your input is always as simple and regular as in your example then this will do what you want using GNU awk for the 3rd arg to match() and gensub():

$ cat tst.awk
BEGIN { OFS="," }
match($0,/"([^"]+)": *("[^"]*"|[0-9]+)/,a) {
    tag = a[1]
    val = gensub(/^"|"$/,"","g",a[2])
    f[tag] = val
    if ( tag == "Status" ) {
        print f["profile"], f["count"], f["Status"]
    }
}

$ awk -f tst.awk file
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

This might work for you (GNU sed):

sed -nE '/profile/{s/.*"(\S+)".*/\1/;h};
         /count/{s/.* (\S+),.*/\1/;H};
         /Status/{s/.*"(\S+)".*/\1/;H;g;s/\n/,/gp;g;s/\n.*\n.*//;h}' file

Stuff the profile, count and Status info in the hold space and after doing so for Status, retrieve the hold space, replace the newlines by commas, print and then remove count and Status details, ready for the next time.

N.B. As this is json, it is better to use jq as this will always be a more robust solution.

potong
  • 55,640
  • 6
  • 51
  • 83
0
awk -F: 'BEGIN{ OFS=""; p=c=s=""; }
    /"profile"/{ p=$2 }
    /"count"/{ c=$2 }
    /"Status"/{ s=$2 }
    { if(s!="") { print p,c,s; s="" }}'  file.json

output:

 "Earth", 6, "ONLINE"
 "Earth", 8, "EXIST"
 "Earth", 1, "GIVEN"
 "Earth", 4, "ERROR"
 "Earth", 49, "INSTOCK"
 "Earth", 389, "RELEASED"
 "Earth", 68, "DELETED"
 "Earth", 280, "CONNECTED"
 "Earth", 1, "UNINSTOCK"
 "Mars", 7, "EXIST"
 "Mars", 20, "INSTOCK"
 "Mars", 110, "RELEASED"
 "Mars", 16, "DELETED"
 "Mars", 41, "CONNECTED"
 "Mars", 1, "UNINSTOCK"
 "Mercury", 4, "EXIST"
 "Mercury", 1224, "INSTOCK"
 "Mercury", 3, "CONNECTED"
 "Mercury", 18, "RELEASED"
 "Mercury", 5, "DELETED"

It is csv, that why text fields are surrounded by double quotes....

If you json is not in 'pretty_print' you might have to do something like:

cat file.json | json_pp | awk .....

Luuk
  • 12,245
  • 5
  • 22
  • 33