64

I am trying to use Logs Insights with data containing JSON in one of the fields, and to parse the JSON fields

My data looks like the following when I put it in insights with the starter code

fields @timestamp, @message
| sort @timestamp desc
| limit 25

How can I easily extract the path variable in my nested JSON to perform aggregations on it ? By looking at some documentation, I thought @message.path would work but it does not seem so. Has anyone successfully interpreted JSON logs in Insights

enter image description here

EDIT : Sample of what my data looks like

#
@timestamp
@message
1
2018-12-19 23:42:52.000
I, [2018-12-19T23:42:52.629855 #23447] INFO -- : [2ce588f1-c27d-4a55-ac05-62a75b39e762] {"method":"GET","path":"/api/v1/professionals/ID","format":"json","controller":"API::V1::Public::ProfessionalsController","action":"show","status":200,"duration":285.27,"view":222.36,"time":"2018-12-19T23:42:52.344+00:00","params":{"include":"user,tags,promotions,company_sector,similar_professionals.tags,similar_professionals.user","format":"json","compress":false,"id":"ID"},"@timestamp":"2018-12-19T23:42:52.629Z","@version":"1","message":"[200] GET /api/v1/professionals/ID (API::V1::Public::ProfessionalsController#show)"}
@logStream  i-05d1d61ab853517a0
@message  I, [2018-12-19T23:42:52.629855 #23447] INFO -- : [2ce588f1-c27d-4a55-ac05-62a75b39e762] {"method":"GET","path":"/api/v1/professionals/ID","format":"json","controller":"API::V1::Public::ProfessionalsController","action":"show","status":200,"duration":285.27,"view":222.36,"time":"2018-12-19T23:42:52.344+00:00","params":{"include":"xxx","format":"json","compress":false,"id":"ID"},"@timestamp":"2018-12-19T23:42:52.629Z","@version":"1","message":"[200] GET /api/v1/professionals/ID (API::V1::Public::ProfessionalsController#show)"}
@timestamp  1545262972000
2
2018-12-19 23:42:16.000
I, [2018-12-19T23:42:16.723472 #851] INFO -- : [ea712503-eb86-4a6e-ab38-ddbcd6c2b4d0] {"method":"GET","path":"/api/v1/heartbeats/new","format":"json","controller":"API::V1::Public::HeartbeatsController","action":"new","status":201,"duration":9.97,"view":3.2,"time":"2018-12-19T23:42:16.712+00:00","params":{"format":"json","compress":false},"@timestamp":"2018-12-19T23:42:16.722Z","@version":"1","message":"[201] GET /api/v1/heartbeats/new (API::V1::Public::HeartbeatsController#new)"}
Cyril Duchon-Doris
  • 12,964
  • 9
  • 77
  • 164
  • What we need is something like `| filter json(@message).method = 'POST'`, or if `@message` has a prefix before the json, `| parse @message /(?<@rawJson>[{].*[}])/ | filter json(@rawJson).method = 'POST'` (or even something like `| parse json(@rawJson) as @parsedJson`). I guess this could enable devs to write inefficient queries, but such a `json` command would give us tons of power too. – Gershom Maes May 31 '23 at 16:53

5 Answers5

62

CloudWatch Insights Logs automatically discovers fields for the following log types:

Lambda logs

CloudWatch Logs Insights automatically discovers log fields in Lambda logs, but only for the first embedded JSON fragment in each log event (Note: emphasis mine). If a Lambda log event contains multiple JSON fragments, you can parse and extract the log fields by using the parse command. For more information, see Fields in JSON Logs.

CloudTrail logs

See fields in JSON logs.

Source: Supported Logs and Discovered Fields

If @message is I, [2018-12-11T13:20:27] INFO -- : {"method":"GET"}

Then you can select and filter the fields like so:

fields @timestamp, @message, method
| filter method = "GET"
| sort @timestamp desc

It works with nested fields too, i.e. params.format = "json" or results.0.firstName = "Paul".

Community
  • 1
  • 1
pyb
  • 4,813
  • 2
  • 27
  • 45
  • Hi, this does not seem to work. I'll edit my question to include samples of what I can see using the default code. Maybe I am missing something ? What is your method `jsonresult` ? is it documented somewhere ?I do have extra ID stuff in my data just after the `-- :` – Cyril Duchon-Doris Dec 19 '18 at 23:47
  • Sorry, jsonresult was a field which I didn't clean up. I removed it. Your examples aren't formatted well right now. My data had the JSON objet at the end but it doesn't seem to matter. Are the JSON fields showing using autocomplete or in thr Discovery section on the right? – pyb Dec 20 '18 at 00:10
  • @pyb My JSON isn't being auto discovered either, just the three default `@`s Can you post an example of your `@message` so I can work out why yours is parsing JSON and mine (and assumedly Cyril's too) is not – GusGold Dec 21 '18 at 00:31
  • @GusGold Ok. My `@message` is just a valid JSON prefixed with a sentence: `Something something JSON: { "url": "https://red.act.ed", "foo": null, "results": [ { "first_name": "Homer", "last_name": "Simpson" }, { "first_name": "Stack", "last_name": "Overflow" } ], "another_array": [ { "foo": "bar", "barbaz": 0.2 } ]}`. – pyb Dec 21 '18 at 15:48
  • @CyrilDuchon-Doris The doc states it only works for Lambda and CloudTrail logs, I updated the answer with details. What kind of logs are you using? – pyb Dec 21 '18 at 23:19
  • 1
    Logs synchronized by my app via the `awslogs` agent installed on Amazon Linux 2. – Cyril Duchon-Doris Dec 31 '18 at 11:06
  • Then I'm sorry, this answer doesn't apply to your use case (yet). – pyb Dec 31 '18 at 12:34
21

You can use the parse command to extract fields.

If @message is

I, [2018-12-11T13:20:27] INFO -- : {"method":"GET"}

Then you extract the fields like so:

fields @timestamp, @message
| parse "I, [*T*] INFO -- : {"method":"*"}" as @date, @time, @method
| filter method=GET
| sort @timestamp desc
| limit 20

The documentation is rather light for now. I am able to get results by replacing the wildcard * with a regular expression, but then the parsing fails.

pyb
  • 4,813
  • 2
  • 27
  • 45
  • 4
    So basically back to raw string parsing huh... The actually query I had to use was `parse 'I, [*T*] * -- : * {*"method":"*",*}' as date, time, severity, id, json_before, method, json_after` since * is greedy (also you forgot to use simple quotes for the parsing expression). It will also totally fail if the JSON keys are not always in the same order. I hope AWS adds support for parsin JSON soon =_= – Cyril Duchon-Doris Dec 19 '18 at 21:17
  • @CyrilDuchon-Doris I tried combining the [CloudWatch Logs syntax](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html) with parse like so: `| parse "$.method" as @method` but it did not work. Maybe it's possible but I don't know of a better way. Since it's a recent product (less than a month), this feature may be added later. – pyb Dec 19 '18 at 21:43
21

Building on @pyb insights, I was able to use parse @message '"path":"*"' as path to extract the path from any place in the @message.

You can go on to get your method by piping another parse @message '"method":"*"' as method without concern for ordering as it is a second global plain text search on @message

In the case where your @message is:

I, [2018-12-19T23:42:52.629855 #23447] INFO -- : [2ce588f1-c27d-4a55-ac05-62a75b39e762] {"method":"GET","path":"/api/v1/professionals/ID","format":"json","controller":"API::V1::Public::ProfessionalsController","action":"show","status":200,"duration":285.27,"view":222.36,"time":"2018-12-19T23:42:52.344+00:00","params":{"include":"xxx","format":"json","compress":false,"id":"ID"},"@timestamp":"2018-12-19T23:42:52.629Z","@version":"1","message":"[200] GET /api/v1/professionals/ID (API::V1::Public::ProfessionalsController#show)"}

Using:

parse @message '"path":"*"' as path | parse @message '"method":"*"' as method

will result in the fields: path = '/api/v1/professionals/ID' and method = 'GET'

Note that this is still simply string parsing and as such, it has no concept of nested keys like params.format would not find json, however using just format would, so long as there wasn't another format string anywhere in your @message.

Also note that this is for the case where Insights is not discovering your JSON in the message. I belive this is the case that @pyb was referring to in this answer. My logs aren't being discovered either using the following format

info - Request: {"method":"POST","path":"/auth/login/","body":{"login":{"email":"email@example.com","password":"********"}},"uuid":"36d76df2-aec4-4549-8b73-f237e8f14e23","ip":"*.*.*.*"}
GusGold
  • 470
  • 5
  • 18
8

Also another parse with help of regex

Assume your @message is:

I, [2018-12-19T23:42:52.629855 #23447] INFO -- : [2ce588f1-c27d-4a55-ac05-62a75b39e762] {"method":"GET","path":"/api/v1/professionals/"}

You can extract method by

fields @timestamp, @message
| parse @message /\"method\":\"(?<method_type>.*?)\"/
Neyma
  • 193
  • 3
  • 9
  • 1
    This works but you don't need to do this - the answer "CloudWatch Insights Logs automatically discovers fields for the following log types:" is the better approach, since the discovery is automatic. – James Beswick Oct 22 '20 at 12:05
  • @JamesBeswick: discovery is not happening in my case. the following is my ebs web log for @message.. if i try to read any propery or with dot notaion (as per docs) its not discovering that. `` Aug 5 18:21:04 ip-172-31-94-196 web: {"level":"debug","name":"arch","version":"0.0.1","environment":"production","timestamp":"2022-08-05T18:21:04.134Z","correlationId":"df4c3cdf-ab9b-4478-a6c2-e9f0bf3c72bc","message":{"event":"request-end","status":200,"method":"GET","url":"/v1/user?address=abc","latency":681}} ``` – Sedhu Aug 05 '22 at 18:30
1

Would this regex query within the parse command help you?

filter @message like / \"path\":\"/
| parse @message /(?<@endpt>((\/[a-zA-Z0-9_{}()-?]+){1,}))/

Good luck!

R0B0T-B0BA
  • 381
  • 1
  • 5