4

I am reading this StackOverFlow discussion converting JSON into CSV and it seems great, but I cant get basic jq to work.. I am not sure what I am doing wrong. I have tried the basic thing and I cant crack whats wrong. Here is my ES query in a Shell Script

curl -XGET 'http://es-1:9200/data_latest/customer/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": ["customer_app_version", "customer_num_apps", "customer_name","app_disk_size_bytes","app_memory_capacity_bytes"],
    "query": {
        "bool": {
            "must": [{
                "term": {
                    "is_app_customer": {
                        "value": "true"
                    }
                }
            }]
        }
    },
    "aggs": {
        "Customer_UUID": {
            "terms": {
                "field": "customer_uuid",
                "size": 100
            }
        }
    }
}

' Shell Script Output

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 6171,
    "max_score": 1.8510876,
    "hits": [
      {
        "_index": "data_latest_v1",
        "_type": "customer",
        "_id": "0003245-4844-9015-1z2e-d4ae5234rd56",
        "_score": 1.8510876,
        "_source": {
          "customer_app_version": "el7.20150513",
          "customer_num_apps": 3,
          "app_memory_capacity_bytes": 405248409600,
          "customer_name": "Timbuktu Inc",
          "app_disk_size_bytes": 25117047875604
        }
      },
      {
        "_index": "data_latest_v1",
        "_type": "customer",
        "_id": "0003245-4844-9015-1z2e-d4ae5234rd56",
        "_score": 1.8510876,
        "_source": {
          "customer_app_version": "el4.20150513",
          "customer_num_apps": 34,
          "app_memory_capacity_bytes": 58923439600,
          "customer_name": "Bunnies Inc",
          "app_disk_size_bytes": 36517984275604
        }
      }
    ]
  }
}

(truncated, but the subset above is syntactically valid)

  1. How do I use jq within the shell script to output the Keys and values in the _source field (nothing else) as a CSV? I know I am asking something that is described in the other discussion, but I tried and could not get it

For example, I added after the ' (end of the above script) I added | jq -r '."customer_name"'

and also tried

| jq -r '.customer_name'

For both I get output like this.

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
103 13566  100 13566    0   346   507k  13248 --:--:-- --:--:-- --:--:--  537k
null

What am i doing wrong? What do I need to do? would be super helpful if someone can guide me here.

Community
  • 1
  • 1
Sam G
  • 151
  • 1
  • 3
  • 13
  • You might want to suppress curl's status output so it doesn't complicate things here. – Charles Duffy Mar 16 '17 at 22:44
  • Also, the output you gave isn't actually well-formed JSON -- it's missing some close elements. Please make sure the data you provide is correct enough to allow folks to test their answers. – Charles Duffy Mar 16 '17 at 22:45
  • Thanks! I will make sure that it is valid Json.. I did call out that the output was being truncated. I am still learning how to suppress some of the output from ES (ex: took, shards etc.. ) – Sam G Mar 16 '17 at 23:15

1 Answers1

8

To describe in your jq query how to navigate in the document to the data you want to extract might look like the following:

jq -r '.hits.hits[]._source.customer_name'

In this case, the output is:

Timbuktu Inc
Bunnies Inc

To generate a key/value CSV, one might use:

jq -r '.hits.hits[]._source | to_entries | .[] | [.key, .value] | @csv'

...with output:

"customer_app_version","el7.20150513"
"customer_num_apps",3
"app_memory_capacity_bytes",405248409600
"customer_name","Timbuktu Inc"
"app_disk_size_bytes",25117047875604
"customer_app_version","el4.20150513"
"customer_num_apps",34
"app_memory_capacity_bytes",58923439600
"customer_name","Bunnies Inc"
"app_disk_size_bytes",36517984275604

If you want customer name to be a column of its own, this might instead be:

jq -r '.hits.hits[]._source | .customer_name as $name | del(.customer_name) | to_entries | .[] | [$name, .key, .value] | @csv'

...with output:

"Timbuktu Inc","customer_app_version","el7.20150513"
"Timbuktu Inc","customer_num_apps",3
"Timbuktu Inc","app_memory_capacity_bytes",405248409600
"Timbuktu Inc","app_disk_size_bytes",25117047875604
"Bunnies Inc","customer_app_version","el4.20150513"
"Bunnies Inc","customer_num_apps",34
"Bunnies Inc","app_memory_capacity_bytes",58923439600
"Bunnies Inc","app_disk_size_bytes",36517984275604

If you're willing to hardcode the column names, consider instead:

jq -r '.hits.hits[]._source | [.customer_name, .customer_app_version, .customer_num_apps, .app_memory_capacity_bytes, .app_disk_size_bytes] | @csv'

with output:

"Timbuktu Inc","el7.20150513",3,405248409600,25117047875604
"Bunnies Inc","el4.20150513",34,58923439600,36517984275604
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Thanks for pointing out that I need to traverse the JSON structure. I will go read the other thread on how to generate the CSV I am looking for. – Sam G Mar 16 '17 at 23:25
  • Charles! You are a rock star! super helpful and so nice of you! I was just reading the other thread and started playing with jq and just figure out how to just list the _source section and was about to read how to get to CSV. You already have described the answer. I will go read it now. Thanks a ton! I will update once I am done. – Sam G Mar 16 '17 at 23:44
  • Super helpful! One minor thing is I had to add double quotes to each attribute to retrieve the values. I haven't figured out yet how to get the column names as the first lane. For now, I write the column names into the file before I run the ES script. – Sam G Mar 17 '17 at 16:35
  • Everything I gave here worked for me *exactly* as given in the answer, so if you had to make changes, I'm curious about the details of exactly what you were running instead and exactly how it failed. If you wouldn't mind uploading a [gist](https://gist.github.com/) with a reproducer that shows the failure, I'd be very interested to see it. – Charles Duffy Mar 17 '17 at 16:41
  • The only change I had to do was add double quotes for this part [."customer_name", ."customer_app_version", ."customer_num_apps", ."app_memory_capacity_bytes", ."app_disk_size_bytes"] – Sam G Mar 18 '17 at 01:30
  • It's expected for those quotes to be needed when names contain dashes, but they shouldn't be needed for underscores -- so that's odd. – Charles Duffy Sep 19 '18 at 12:36