2

I am parsing through a log file and get result lines (using grep) like the following:

2017-01-26 17:19:40 +0000 docker: {"source":"stdout","log":"I, [2017-01-26T17:19:40.703988 #24]  INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}","container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300"}
2017-01-26 17:19:40 +0000 docker: {"container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300","source":"stdout","log":"I, [2017-01-26T17:19:40.704364 #24]  INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"}

I then extract the JSON objects with the following command:

... | grep -o -E "\{.*$"

I know I can parse a single line with python -mjson.tool like so:

... | grep -o -E "\{.*$" | tail -n1 | python -mjson.tool

But I want to parse both lines (or n lines). How can I do this in bash? (I think xargs is supposed to let me do this, but I am new to the tool and can't figure it out)

codeforester
  • 39,467
  • 16
  • 112
  • 140
Nathan Hanna
  • 4,643
  • 3
  • 28
  • 32
  • Why not just put it all in a python script? – user2864740 Jan 27 '17 at 23:08
  • 1
    If you want to parse JSON in a shell script, get the `jq` tool. – Barmar Jan 27 '17 at 23:11
  • You saw this one? http://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools – triplem Jan 27 '17 at 23:12
  • If you want to do this on an ongoing/streaming basis, it's something that's liable to be in the domain of [Logstash](https://www.elastic.co/products/logstash) -- certainly, part of what it's built to do *well* (at-scale with good error handling). – Charles Duffy Jan 27 '17 at 23:15
  • That said, to be clear, you *could* do this with `jq` -- it's perfectly capable of taking raw text as input, extracting a substring and parsing that substring as JSON. – Charles Duffy Jan 27 '17 at 23:17
  • 1
    btw -- `xargs` is responsible for transforming stdin to *argument lists*. Since you don't want to pass your JSON as a command-line argument to `python -m json.tool`, it's not an appropriate tool for the job. – Charles Duffy Jan 28 '17 at 16:10

1 Answers1

4

jq can be told to accept plain text as input, and attempt to parse an extracted subset as JSON. Consider the following example, tested with jq 1.5:

jq -R 'capture("docker: (?<json>[{].*[}])$") | .json? | select(.) | fromjson' <<'EOF'
2017-01-26 17:19:40 +0000 docker: {"source":"stdout","log":"I, [2017-01-26T17:19:40.703988 #24]  INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}","container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300"}
2017-01-26 17:19:40 +0000 docker: {"container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300","source":"stdout","log":"I, [2017-01-26T17:19:40.704364 #24]  INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"}
EOF

...properly yields:

{
  "source": "stdout",
  "log": "I, [2017-01-26T17:19:40.703988 #24]  INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}",
  "container_id": "6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e",
  "container_name": "/test-container-b49c8188c3ebe4b93300"
}
{
  "container_id": "6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e",
  "container_name": "/test-container-b49c8188c3ebe4b93300",
  "source": "stdout",
  "log": "I, [2017-01-26T17:19:40.704364 #24]  INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"
}
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441