0

I am trying to convert my json file to a csv file using jq. Below is the sample input events.json file.

{
  "took" : 111,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "alerts",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "alertID" : "639387c3-0fbe-4c2b-9387-c30fbe7c2bc6",
          "alertCategory" : "Server Alert",
          "description" : "Successfully started.",
          "logId" : null
          }
       },
       {
        "_index" : "alerts",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "alertID" : "2",
          "alertCategory" : "Server Alert",
          "description" : "Successfully stoped.",
          "logId" : null
          }
       }
   ]
  }
}

My rows in csv should have the data inside each _source tag. So my columns would be alertId , alertCategory , description and logId with its respective data.

I tried the below command : jq --raw-output '.hits[] | [."alertId",."alertCategory",."description",."logId"] | @csv' < /root/events.json and its not working.

Can anyone help me with this?

darecoder
  • 1,478
  • 2
  • 14
  • 29

1 Answers1

1

Your path-expression is not right, you have a hits array inside an object named hits and the fields you trying to put in CSV is present under __source object.

So your expression should have been below. Use it along with -r flag to put the output in raw output format

.hits.hits[]._source | [ .alertID, .alertCategory, .description, .logId ] | @csv

If your fields are null, the string representation of your null field value results in just "". If you want an explicit "null" string representation, use the alternate operator along with the field you expect to be null, e.g. instead of .logId, you can do (.logId // "null")

To add the column name as the header in the output CSV format, you could use the @csv or the join(",") function in raw output format -r

[ "alertId" , "alertCategory" , "description", "logId" ], 
( .hits.hits[]._source |  [ .alertID, .alertCategory, .description, .logId // "null" ]) | @csv

or

[ "alertId" , "alertCategory" , "description", "logId" ], 
( .hits.hits[]._source |  [ .alertID, .alertCategory, .description, .logId // "null" ]) | join(",")
Inian
  • 80,270
  • 14
  • 142
  • 161
  • A big thanks for your response. I tried this command ```jq -r '.hits.hits[]._source | [."alertId",."alertCategory",."description",."logId" // "null"] | @csv' < /root/events.json``` Two problems : I am not getting the column names as the first row and in some cases, the column ```alertId``` has alphanumeric value like ```639387c3-0fbe-4c2b-9387-c30fbe7c2bc6``` and in those cases, its coming as blank in the output. – darecoder Oct 27 '20 at 19:29
  • hey it's working now, I used ```alertId``` instead of ```alertID```. Just need help in getting the column names in the first row. – darecoder Oct 27 '20 at 20:23
  • what if my columns are dynamic inside ```_source``` ? – darecoder Oct 28 '20 at 14:49
  • @darecoder: Could you ask a separate question for it. With the accepted status, any updates to Q or A in this should be freezed. I'll be glad to answer a new one – Inian Oct 28 '20 at 14:50
  • https://stackoverflow.com/questions/64576887/json-file-to-csv-file-conversion-when-my-json-columns-are-dynamic @Inian – darecoder Oct 28 '20 at 16:13