1

There are many questions that say "I have this exact JSON structure, please do my homework for me and make it have this other exact structure." This is NOT one of those questions.

I want to see jq scripts for cleaning up generic data. I will post an answer below that works pretty well for most of the data I get out of my logs. But, seeing as there are many smart people out here, I want to see what others come up with.

Note: This is not a duplicate of other questions, because the nature of every question I have seen on SO results in answers that are pretty much only useful to the OP. The answers to this question should be helpful to everyone Googling for something similar. Some lucky answerer will get a reputation bomb like I did with my answer here.

Bruno Bronosky
  • 66,273
  • 12
  • 162
  • 149
  • "cleaning up generic data" is rather broad, at least for someone who is not familiar with your logs. Even though you have a generic question, following the [mcve] guidelines would still be a good idea. – peak Feb 06 '19 at 22:04
  • If my only direction was "clean up generic data", that would be too broad. But I titled the question to be JSON in JSON specific and gave an example. The part that I want to be generic is the structure of the source, not the problem to be corrected. – Bruno Bronosky Feb 07 '19 at 00:54
  • The mcve is usually included in the Q, not the A. The example in the A is not really minimal and I didn't find it very illuminating e,g, w.r.t. the `tostring` representation of arrays. – peak Feb 07 '19 at 03:34
  • @peak I'm sorry you don't find this very mcve or illuminating, but I find your answer amazingly enlightening. It is tremendously more useful to Google visitors than your answer to https://stackoverflow.com/a/42937081/117471 which could have also been solved with the answer you gave here. That is the reason I wanted to not give a data structure in the Q. – Bruno Bronosky Feb 07 '19 at 04:04

2 Answers2

1

Running fromjson on all strings could be done, e.g. using walk and ?:

walk(if type == "string" then fromjson? // . else . end)

Leaving numeric strings alone

walk(if type == "string"
     then if tonumber? // false then . 
          else fromjson? // . end
     else . end)
peak
  • 105,803
  • 17
  • 152
  • 177
  • That's pretty effective. It does have an unfortunate side effect of converting `"EventVersion": "1.0"` to `"EventVersion": 1.0` from my sample JSON, but I like this very much for a "I need to be able to read this minified mess" tool. Thanks for the contribution. – Bruno Bronosky Feb 07 '19 at 03:21
  • If you don't want numeric strings to be converted to numbers, then you can easily avoid that by using `tonumber?` or `test` with a suitable regex. – peak Feb 07 '19 at 03:26
  • Wow! This answer has been a great learning experience for me just trying to understand it. Before today I had never even used a function in `jq`. Now I'm trying to get my head around `?` and `//`, not to be confused with `?//`. – Bruno Bronosky Feb 07 '19 at 04:21
0

Assuming all embedded JSON is a minified dictionary (as is the case with my Terraform output, AWS cli output, and AWS logs) this one jq script works wonders.

jq 'walk(if type == "string" and .[0:2] == "{\"" then .=(.|fromjson) else . end)'

It works by walking the json object looking for strings that begin with {" and uses a subprocess to pipe them through fromjson (never leaving jq).

I put it in a bash function (jqp) because it's easier than escaping the quotes for an alias and MUCH more flexible. Then I can use it to process a file or clipboard contents.

# This is in my .bash_profile
jqp(){
  jq 'walk(if type == "string" and .[0:2] == "{\"" then .=(.|fromjson) else . end)' "$@"
}
# Here is an event trigger from SNS to Lambda
$ cat event.json
{
    "Records": [
        {
            "EventVersion": "1.0",
            "EventSubscriptionArn": "arn:aws:sns:us-east-1:123456789012:sns-to-slack-shared-services:a70df027-2c7f-492a-840a-633d44fd71a6",
            "EventSource": "aws:sns",
            "Sns": {
                "SignatureVersion": "1",
                "Timestamp": "2019-02-06T15:50:30.028Z",
                "Signature": "GN3712/aWjVLftSzdOcW5Zm32/uvfZKrCcvTmz6Obv/AXbz1xc22sTMYt2vFja7coHGhhO5bG6dz/IbJSx/Zm0U/dVVefWKukFl1umP3av+1JoUbbi+4uHai3k3AwQa3wR4HWjrKKmMt+Tkt/gm7jvhcuojtx+n5oc4S6bMsVq5OmSfAWd2Xd1urTm2zeGCL59nbfhZv+xB4db3dk62FtxVKtFXtvO2pH27+E3vXUvgu2k1c2Kd/Vt/vbYCAA==",
                "SigningCertUrl": "https://sns.us-east-1.amazonaws.com/SimpleNotificationService-a70df027-2c7f-492a-840a-633d44fd71a6.pem",
                "MessageId": "a8df3067-c347-55ce-b869-64b2c7c1d0a3",
                "Message": "{\"AlarmName\":\"unauthorized_api_calls_Count-alarm\",\"AlarmDescription\":\"This metric monitors unauthorized API calls\",\"AWSAccountId\":\"123456789012\",\"NewStateValue\":\"ALARM\",\"NewStateReason\":\"Threshold Crossed: 1 datapoint [5.0 (06/02/19 15:45:00)] was greater than or equal to the threshold (1.0).\",\"StateChangeTime\":\"2019-02-06T15:50:30.023+0000\",\"Region\":\"US East (N. Virginia)\",\"OldStateValue\":\"INSUFFICIENT_DATA\",\"Trigger\":{\"MetricName\":\"unauthorized_api_calls\",\"Namespace\":\"security_rules\",\"StatisticType\":\"Statistic\",\"Statistic\":\"SUM\",\"Unit\":null,\"Dimensions\":[],\"Period\":300,\"EvaluationPeriods\":1,\"ComparisonOperator\":\"GreaterThanOrEqualToThreshold\",\"Threshold\":1.0,\"TreatMissingData\":\"\",\"EvaluateLowSampleCountPercentile\":\"\"}}",
                "MessageAttributes": {},
                "Type": "Notification",
                "UnsubscribeUrl": "https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-1:123456789012:sns-to-slack-shared-services:a70df027-2c7f-492a-840a-633d44fd71a6",
                "TopicArn": "arn:aws:sns:us-east-1:123456789012:sns-to-slack-shared-services",
                "Subject": "ALARM: \"unauthorized_api_calls_Count-alarm\" in US East (N. Virginia)"
            }
        }
    ]
}

# Demonstrate that "$@" in the function allows the use of extra options
$ jqp --indent 4 event.json
{
    "Records": [
        {
            "EventVersion": "1.0",
            "EventSubscriptionArn": "arn:aws:sns:us-east-1:123456789012:sns-to-slack-shared-services:a70df027-2c7f-492a-840a-633d44fd71a6",
            "EventSource": "aws:sns",
            "Sns": {
                "SignatureVersion": "1",
                "Timestamp": "2019-02-06T15:50:30.028Z",
                "Signature": "GN3712/aWjVLftSzdOcW5Zm32/uvfZKrCcvTmz6Obv/AXbz1xc22sTMYt2vFja7coHGhhO5bG6dz/IbJSx/Zm0U/dVVefWKukFl1umP3av+1JoUbbi+4uHai3k3AwQa3wR4HWjrKKmMt+Tkt/gm7jvhcuojtx+n5oc4S6bMsVq5OmSfAWd2Xd1urTm2zeGCL59nbfhZv+xB4db3dk62FtxVKtFXtvO2pH27+E3vXUvgu2k1c2Kd/Vt/vbYCAA==",
                "SigningCertUrl": "https://sns.us-east-1.amazonaws.com/SimpleNotificationService-a70df027-2c7f-492a-840a-633d44fd71a6.pem",
                "MessageId": "a8df3067-c347-55ce-b869-64b2c7c1d0a3",
                "Message": {
                    "AlarmName": "unauthorized_api_calls_Count-alarm",
                    "AlarmDescription": "This metric monitors unauthorized API calls",
                    "AWSAccountId": "123456789012",
                    "NewStateValue": "ALARM",
                    "NewStateReason": "Threshold Crossed: 1 datapoint [5.0 (06/02/19 15:45:00)] was greater than or equal to the threshold (1.0).",
                    "StateChangeTime": "2019-02-06T15:50:30.023+0000",
                    "Region": "US East (N. Virginia)",
                    "OldStateValue": "INSUFFICIENT_DATA",
                    "Trigger": {
                        "MetricName": "unauthorized_api_calls",
                        "Namespace": "security_rules",
                        "StatisticType": "Statistic",
                        "Statistic": "SUM",
                        "Unit": null,
                        "Dimensions": [],
                        "Period": 300,
                        "EvaluationPeriods": 1,
                        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
                        "Threshold": 1,
                        "TreatMissingData": "",
                        "EvaluateLowSampleCountPercentile": ""
                    }
                },
                "MessageAttributes": {},
                "Type": "Notification",
                "UnsubscribeUrl": "https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-east-1:123456789012:sns-to-slack-shared-services:a70df027-2c7f-492a-840a-633d44fd71a6",
                "TopicArn": "arn:aws:sns:us-east-1:123456789012:sns-to-slack-shared-services",
                "Subject": "ALARM: \"unauthorized_api_calls_Count-alarm\" in US East (N. Virginia)"
            }
        }
    ]
}
Bruno Bronosky
  • 66,273
  • 12
  • 162
  • 149