0

I have an upstream application sending following JSON message to an SNS topic. We use AWS lambda function to save this JSON object in S3:

{
"processResult": {
    "processName": "XYZ",
    "stageResults": {
        "Read Files": {
            "status": "PROCESSED",
            "error": "",
            "timeTaken": 26064469473
        },
        "Convert Files": {
            "status": "PROCESSED",
            "error": "",
            "timeTaken": 97968896
        }
    },
    "processMetrics": {
        "filesProcessed": 1157,
        "filesWithExceptionCount": 1,
        "timeTaken": "367.460031s",
        "metricsCalcTime": "6.061847s",
        "totalTimeTaken": "373.521878s"
    },
    "succeeded": true
  }
}

If you observe within the "stageResults" STRUCT, I have the attributes with spaces, like "Read Files" and "Convert Files" and when i am trying to read this JSON from S3 using AWS Athena (AWS Glue Crawler), I am getting following error:

    HIVE_METASTORE_ERROR: com.facebook.presto.spi.PrestoException: Error: : expected at the position 51 of 'struct<....
......but ' ' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null)

But when I manually edit the JSON message by modifying "stageResults" STRUCT like "Read_Files" and "Convert_Files", I was able to read and query the JSON using AWS Athena tables successfully.

Following is the piece of code from AWS Lambda which triggers on SNS topic, reads JSON message and saves it in S3:

import json
import boto3
import random
import string

file_name = ''.join([random.choice(string.ascii_lowercase) for i in range(16)])

def lambda_handler(event, context):
    target_bucket = 'bucket-name'
    target_key = 'input=clientdata/'  + file_name + '.json'
    s3 = boto3.resource('s3')
    for record in event['Records']:
        payload = record["body"]
        data = json.loads(payload)
        print(data)
        print("copying JSON message...")
        s3.Object('target_bucket', 'target_key').put(
            Body=(bytes(json.dumps(data).encode('UTF-8')))
        )

Now, I want to know how to modify this JSON message before saving it to S3 using AWS Lambda. Any help is appreciated.

qubiter
  • 235
  • 1
  • 5
  • 19
  • Since your question comes down to changing the name of a dictionary key I advise you to take a look at [this](https://stackoverflow.com/questions/16475384/rename-a-dictionary-key) – yorodm Feb 13 '19 at 19:36
  • thats right but i dont know what keys i will be receiving. All i need to do is to remove the spaces in the dictionary key before saving it. – qubiter Feb 13 '19 at 19:50

1 Answers1

0

If all you want is to change every key containing spaces into something suitable for your needs this might do:

import re
wrong_keys = []
for key in data.keys():
    if key.strip().find(" ") != -1:
        wrong_keys.append(key)
for key in wrong_key:
  data[re.sub(r'[\W]+', '_',key)] = data.pop(key)
yorodm
  • 4,359
  • 24
  • 32