35

I am trying to create a json object from a string in bash. The string is as follows.

CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0

The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.

{
    "CONTAINER":"nginx_container",
    "CPU%":"0.02%", 
    ....
}

I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
michael_65
  • 571
  • 1
  • 6
  • 10
  • 2
    I dont think JQ is the tool for the job (it's JSON in/JSON out) I did something similar recently and ended up using the RUBY CSV and JSON modules (CSV can use | as a delimiter) Python has similar classes – Jimmy Aug 09 '16 at 21:25
  • 1
    Some people use awk to create their JSON from delimited input – Jimmy Aug 09 '16 at 21:26
  • 3
    @Jimmy, eh? jq is *absolutely* an excellent tool for this job. – Charles Duffy Aug 09 '16 at 23:49
  • 1
    @Jimmy, ...and jq isn't limited to JSON in. It can read raw strings (see the `-R` option), and has regex support (so it can parse any syntax you see fit to send it). – Charles Duffy Aug 10 '16 at 00:11
  • 1
    @Jimmy, ...not limited to JSON out, either, for that matter; the current version additionally supports writing CSV, %-encoded URIs, HTML, POSIX-sh-compliant shell-escaped syntax, and base64-encoded literal strings. – Charles Duffy Aug 10 '16 at 00:51
  • JQ was my first port of call and I can certainly see sticking the advantage of sticking with a tool like JQ (fewer dependencies) but the ruby was very simple and clean (e.g. http://stackoverflow.com/questions/5357711/csv-to-json-ruby-script). It had trouble with field names with a % in them. Will definitely try jq again next time – Jimmy Aug 10 '16 at 07:24
  • 1
    @Jimmy, ...to be fair, I missed the CSV-input spec earlier. While `jq` supports native CSV output, since it doesn't have parsing support that extends to the quirks of the language (regexes being a poor tool for dealing with quoting and escaping semantics and the like), I might have ended up using a language with a native CSV-parsing library in that case myself as well. – Charles Duffy Aug 10 '16 at 14:53

7 Answers7

88

Prerequisite

For all of the below, it's assumed that your content is in a shell variable named s:

s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'

What (modern jq)

# thanks to @JeffMercado and @chepner for refinements, see comments
jq -Rn '
( input  | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"

How (modern jq)

This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:

  • Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
  • With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
  • With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
  • With from_entries, we're combining those pairs into objects containing those keys and values.

What (shell-assisted)

This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:

{
   IFS='|' read -r -a keys # read first line into an array of strings

   ## read each subsequent line into an array named "values"
   while IFS='|' read -r -a values; do

    # setup: positional arguments to pass in literal variables, query with code    
    jq_args=( )
    jq_query='.'

    # copy values into the arguments, reference them from the generated code    
    for idx in "${!values[@]}"; do
        [[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
        jq_args+=( --arg "key$idx"   "${keys[$idx]}"   )
        jq_args+=( --arg "value$idx" "${values[$idx]}" )
        jq_query+=" | .[\$key${idx}]=\$value${idx}"
    done

    # run the generated command
    jq "${jq_args[@]}" "$jq_query" <<<'{}'
  done
} <<<"$s"

How (shell-assisted)

The invoked jq command from the above is similar to:

jq --arg key0   'CONTAINER' \
   --arg value0 'nginx_container' \
   --arg key1   'CPU%' \
   --arg value1 '0.0.2%' \
   --arg key2   'MEMUSAGE/LIMIT' \
   --arg value2 '25.09MiB/15.26GiB' \
   '. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
   <<<'{}'

...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.


Result

Either of the above will emit:

{
  "CONTAINER": "nginx_container",
  "CPU%": "0.02%",
  "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
  "MEM%": "0.16%",
  "NETI/O": "0B/0B",
  "BLOCKI/O": "22.09MB/4.096kB",
  "PIDS": "0"
}

Why

In short: Because it's guaranteed to generate valid JSON as output.

Consider the following as an example that would break more naive approaches:

s='key ending in a backslash\
value "with quotes"'

Sure, these are unexpected scenarios, but jq knows how to deal with them:

{
  "key ending in a backslash\\": "value \"with quotes\""
}

...whereas an implementation that didn't understand JSON strings could easily end up emitting:

{
  "key ending in a backslash\": "value "with quotes""
}
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • It'd be easier if you used `transpose` since you already have arrays of keys and values. Transposing effectively zips them together which will allow you to build out the object rather easily. – Jeff Mercado Aug 10 '16 at 02:14
  • @JeffMercado, I feel like I'm missing something that should be obvious -- is there an idiom that makes more sense than `[$keys, $vals] | transpose | [ .[] | {"key": .[0], "value": .[1]} ] | from_entries`? – Charles Duffy Aug 10 '16 at 02:29
  • 3
    I don't know if there's an idiomatic way to do this, but I see a number of ways it could be achieved. I personally like using `from_entries`: `[[$keys,$values] | transpose[] | {key:.[0],value:.[1]}] | from_entries`. Or create objects out of the pairs and add them up: `[[$keys,$values] | transpose[] | {(.[0]):.[1]}] | add`. Or using `reduce` to assign the values: `reduce ([$keys,$values] | transpose[]) as $p ({}; .[$p[0]] = $p[1])` – Jeff Mercado Aug 10 '16 at 02:43
  • 1
    Oooh -- that usage of `add` is a trick I hadn't been aware of. Shiny! :) – Charles Duffy Aug 10 '16 at 03:48
  • Amended to go the `from_entries` route; it's probably more accessible to folks without a functional-programming background. @JeffMercado, thank you again for the suggestion. – Charles Duffy Aug 10 '16 at 04:12
  • @chepner, thank you for the addendum. Does it work in the more-than-two-line case as currently written? – Charles Duffy Aug 10 '16 at 12:58
  • Sadly, no. (I forgot that yours did.) I'll see if I can adjust at the cost of some brevity. – chepner Aug 10 '16 at 13:05
  • 3
    This appears to work: `jq -Rn '(input|split("|")) as $keys | (inputs | split("|")) as $vals | [[$keys, $vals] | transpose [] ...`. I don't see a way to avoid the variables altogether yet, probably due to the order in which the filters are evaluated. – chepner Aug 10 '16 at 14:41
  • @chepner, ...heh -- while you were writing that up, I put together something very similar, leveraging your prior suggestion. `$vals` could be avoided easily enough, but I'm kinda' fond of having it for readability. – Charles Duffy Aug 10 '16 at 14:46
  • @Charles Duffy could you give me an idea or point me in the right direction to do this for json with nested objects and arrays ? – user3738936 Feb 01 '18 at 14:10
  • @user3738936, I'd suggest asking a separate question with a working (correctly parsing) sample of your input format and concrete expected output, if you can't find anything already covering the space. Feel free to @-notify me from a comment on that question hwen it's together. – Charles Duffy Feb 01 '18 at 16:01
19

I know this is an old post, but the tool you seek is called jo: https://github.com/jpmens/jo

A quick and easy example:

$ jo my_variable="simple"
{"my_variable":"simple"}

A little more complex

$ jo -p name=jo n=17 parser=false
{
  "name": "jo",
  "n": 17,
  "parser": false
}

Add an array

$ jo -p name=jo n=17 parser=false my_array=$(jo -a {1..5})
{
  "name": "jo",
  "n": 17,
  "parser": false,
  "my_array": [
    1,
    2,
    3,
    4,
    5
  ]
}

I've made some pretty complex stuff with jo and the nice thing is that you don't have to worry about rolling your own solution worrying about the possiblity of making invalid json.

Jim
  • 1,499
  • 1
  • 24
  • 43
9

You can ask docker to give you JSON data in the first place

docker stats --format "{{json .}}"

For more on this, see: https://docs.docker.com/config/formatting/

MatrixManAtYrService
  • 8,023
  • 1
  • 50
  • 61
1
JSONSTR=""
declare -a JSONNAMES=()
declare -A JSONARRAY=()
LOOPNUM=0

cat ~/newfile | while IFS=: read CONTAINER CPU MEMUSE MEMPC NETIO BLKIO PIDS; do
    if [[ "$LOOPNUM" = 0 ]]; then
        JSONNAMES=("$CONTAINER" "$CPU" "$MEMUSE" "$MEMPC" "$NETIO" "$BLKIO" "$PIDS")
        LOOPNUM=$(( LOOPNUM+1 ))
    else
        echo "{ \"${JSONNAMES[0]}\": \"${CONTAINER}\", \"${JSONNAMES[1]}\": \"${CPU}\", \"${JSONNAMES[2]}\": \"${MEMUSE}\", \"${JSONNAMES[3]}\": \"${MEMPC}\", \"${JSONNAMES[4]}\": \"${NETIO}\", \"${JSONNAMES[5]}\": \"${BLKIO}\", \"${JSONNAMES[6]}\": \"${PIDS}\" }"
    fi 
done

Returns:

{ "CONTAINER": "nginx_container", "CPU%": "0.02%", "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB", "MEM%": "0.16%", "NETI/O": "0B/0B", "BLOCKI/O": "22.09MB/4.096kB", "PIDS": "0" }
Nick Bull
  • 9,518
  • 6
  • 36
  • 58
  • FYI -- see fourth paragraph of http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html, specifying conventions for environment variable names: The OS and shell use names with uppercase characters only, whereas lowercase names are "reserved for applications", and it's guaranteed that applications can define any name in that space without modifying standard-utility behavior. This convention has impact on shell variables as well because they share a namespace: Using an environment variable's name for a shell variable overwrites that environment variable, causing a conflict. – Charles Duffy Aug 10 '16 at 00:06
  • I'd separate reading the header from the loop so that you can avoid the `LOOPNUM` logic: `... | { IFS=: read -a jsonnames; while IFS=: read ...; do echo ...; done; }`. – chepner Aug 10 '16 at 12:45
  • 1
    BTW, piping into a loop means that you can't retain state past that loop's exit [absent a very new bash with the `lastpipe` option, or a shell where this is default behavior such as `ksh`]. See BashFAQ #24 (http://mywiki.wooledge.org/BashFAQ/024) -- `while ...; done – Charles Duffy Aug 10 '16 at 16:02
1

Here is a solution which uses the -R and -s options along with transpose:

   split("\n")                       # [ "CONTAINER...", "nginx_container|0.02%...", ...]
 | (.[0]    | split("|")) as $keys   # [ "CONTAINER", "CPU%", "MEMUSAGE/LIMIT", ... ]
 | (.[1:][] | split("|"))            # [ "nginx_container", "0.02%", ... ] [ ... ] ...
 | select(length > 0)                # (remove empty [] caused by trailing newline)
 | [$keys, .]                        # [ ["CONTAINER", ...], ["nginx_container", ...] ] ...
 | [ transpose[] | {(.[0]):.[1]} ]   # [ {"CONTAINER": "nginx_container"}, ... ] ...
 | add                               # {"CONTAINER": "nginx_container", "CPU%": "0.02%" ...
jq170727
  • 13,159
  • 3
  • 46
  • 56
1

json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}' json_string=$(printf "$json_template" "nginx_container" "0.02%" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0") echo "$json_string"

Not using jq but possible to use args and environment in values.

CONTAINER=nginx_container json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}' json_string=$(printf "$json_template" "$CONTAINER" "$1" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0") echo "$json_string"

NoamG
  • 1,145
  • 10
  • 17
0

If you're starting with tabular data, I think it makes more sense to use something that works with tabular data natively, like sqawk to make it into json, and then use jq work with it further.

echo 'CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0' \
        | sqawk -FS '[|]' -RS '\n' -output json 'select * from a' header=1 \
        | jq '.[] | with_entries(select(.key|test("^a.*")|not))'

    {
      "CONTAINER": "nginx_container",
      "CPU%": "0.02%",
      "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
      "MEM%": "0.16%",
      "NETI/O": "0B/0B",
      "BLOCKI/O": "22.09MB/4.096kB",
      "PIDS": "0"
    }

Without jq, sqawk gives a bit too much:

[
  {
    "anr": "1",
    "anf": "7",
    "a0": "nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0",
    "CONTAINER": "nginx_container",
    "CPU%": "0.02%",
    "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
    "MEM%": "0.16%",
    "NETI/O": "0B/0B",
    "BLOCKI/O": "22.09MB/4.096kB",
    "PIDS": "0",
    "a8": "",
    "a9": "",
    "a10": ""
  }
]
MatrixManAtYrService
  • 8,023
  • 1
  • 50
  • 61