0

I have a dictionary returned by a response in AWS Lambdas which looks like below:

    data = {
        "id": 1,
        "name": "Hey",
        "category": null,
        "salary": 1222,
        "Roles": [
            {
                "attributes1": {
                "key": "sum_assured",
                "value": 192000
            },
                "attributes2": {
                    "key": "cash_value",
                    "value": 48576
                },
                "attributes3": {
                    "key": "expiry_date",
                    "value": null
                }
            }
        ]
    
    }

Now, I want to format a few numeric columns like salary to 1222.00 and convert other non-string columns to strings for "ID" I should get "1" as a string and not an Int. Here, null is the actual NaN returned by the response.

Below is the code which I tried:

new_dict = {}
for i, j in data.items():
    print(i)
    if (i == "salary"):
        new_dict[i] = "%0.2f" % j
    if isinstance(j, str):
        new_dict[i] = j
    elif j is None:
        new_dict[i] = ""
    else:
        new_dict[i] = str(j)

I know that I could maintain a list for such amount related columns and then I can do some kind of comparisons and replace it. When I test the value for salary inside the loop it gives me decimal places but in the response, it doesn't show up.

Output:

{
  "id": "1",
  "name": "Hey",
  "category": "",
  "salary": "1222",
  "Roles": "[{'attributes4': {'key': 'sum_assured', 'value': 192000}, 'attributes5': {'key': 'cash_value', 'value': 48576}, 'attributes6': {'key': 'expiry_date', 'value': 'null'}}]"
}

Expected:

{
  "id": "1",
  "name": "Hey",
  "category": "",
  "salary": "1222",
  "Roles": "[{"attributes4": {"key": "sum_assured", "value": "192000.00"}, "attributes5": {"key": "cash_value", "value": "48576.00"}, "attributes6": {"key": "expiry_date", "value": ""}}]"
}

Also, how could I format the records of the Roles array with some kind of indent in it? I wasn't able to do that.

Update 1:

I tried the below code for accessing the nested attributes as per my JSON but it gives me a key error:

for i, j in formatted_json.items():
    print(type(j))
    if isinstance(j, list):
        for k in j:
            for key, value in k.items():
                for l, m in value.items():
                    if (m == "salary" or m == 'value'):
                        new_dict[key][l] = "%0.2f" % m
                    elif isinstance(m, str):
                        new_dict[key][l] = m
                    elif m is None:
                        new_dict[key][l] = ""
                    else:
                        new_dict[key][l] = str(m)

    elif (i == "salary" or i == 'value'):
        new_dict[i] = "%0.2f" % j
    elif isinstance(j, str):
        new_dict[i] = j
    elif j is None:
        new_dict[i] = ""
    else:
        new_dict[i] = str(j)



Traceback (most recent call last):
  File "/home/.../Downloads/pyspark/src/sample/json_to_string.py", line 392, in <module>
    new_dict[key][l] = m
KeyError: 'attributes4'  

Update 2:

The actual dataset is as below:

json_data = '''{
    "policy_no": "",
    "paid_to_date": "2016-09-28",
    "proposal_no": null,
    "agent_code": "1234",
    "policy_status": "Death",
    "policy_term": "15",
    "policy_premium_payment_term": "7",
    "policy_effective_date": "2013-09-28",
    "last_premium_paid": 13790.85,
    "last_premium_paid_date": "2016-06-17",
    "pivc_flag_status": null,
    "product_code": "",
    "product_name": "Monthly Income Plan",
    "application_date": "2013-09-16",
    "underwriting_decesion_date": "2013-09-26",
    "issuance_date": "2013-09-30",
    "ols_policy": "N",
    "next_premium_due_date": "2016-09-28",
    "policy_year": null,
    "premium_holiday_from_date": null,
    "premium_holiday_to_date": null,
    "product_type": "Non-Linked",
    "proposer_client_code": "",
    "life_insured_client_code": "",
    "ppt_completion_date": "2020-09-28",
    "ppt_left": 4,
    "auto_termination_date": null,
    "fund_value": null,
    "loan_amt": 0,
    "welcome_kit_current_status": null,
    "dispatch_date": "2013-10-25",
    "status_date": "2013-10-25",
    "updated_on": "2013-10-26 19:51:51.983",
    "surrender_value": null,
    "loan_eligiability": null,
    "courier_name": null,
    "awb_no": "",
    "status_id": 5,
    "claim_registration_date": "2021-12-15",
    "claim_approval_reject_date": "2021-12-15",
    "claim_requirement_document": "HEART ATTACK",
    "claim_requirement_status": "P",
    "requirement_raised_on": "2021-12-15",
    "requirement_closed_on": "2021-12-15",
    "claim_status": "P",
    "status_for_death_claims": "P",
    "date_of_transaction": "2021-12-15",
    "ecs_si_registration_date": null,
    "ecs_si_status": "inactive",
    "ecs_si_failure_reasons": null,
    "ecs_si_status_as_per_rls": null,
    "ecs_day": "0",
    "payment_mode": "S",
    "payment_mode_desc": "Semi Anually",
    "payment_method": "S",
    "payment_method_desc": "Cash/Cheque",
    "payment_date": "2021-12-15",
    "payment_amount": 13790.85,
    "premium_base_amt": null,
    "premium_tax_amt": null,
    "due_premium": null,
    "total_premium": 13790.85,
    "advance_premium": null,
    "assignee": [
        {
            "assignee_name": "",
            "attributes1": {
                "key": "assignee_client_id",
                "value": ""
            },
            "attributes2": {
                "key": "assignee_type",
                "value": ""
            }
        }
    ],
    "agents": [
        {
            "attributes1": {
                "key": "policy_no",
                "value": ""
            },
            "agent_name": "",
            "attributes2": {
                "key": "agent_code",
                "value": "1234"
            },
            "attributes3": {
                "key": "email",
                "value": ""
            },
            "attributes4": {
                "key": "channel",
                "value": "AGENCY"
            },
            "attributes5": {
                "key": "subchannel",
                "value": null
            },
            "attributes6": {
                "key": "branch_name",
                "value": ""
            },
            "attributes7": {
                "key": "contact_no",
                "value": ""
            },
            "attributes8": {
                "key": "status",
                "value": "T"
            }
        }
    ],
    "bank_accounts": {
        "attributes1": {
            "key": "accountnumber",
            "value": ""
        },
        "attributes2": {
            "key": "accountholdername",
            "value": ""
        },
        "attributes3": {
            "key": "ifsccode",
            "value": ""
        },
        "attributes4": {
            "key": "micrcode",
            "value": ""
        },
        "bankname": "",
        "attributes5": {
            "key": "branchname",
            "value": ""
        },
        "attributes6": {
            "key": "bank_address",
            "value": ""
        }
    },
    "Funds": [
        {
            "attributes1": {
                "key": "policy_no",
                "value": ""
            },
            "attributes2": {
                "key": "allocation_mode",
                "value": "percentage"
            },
            "attributes3": {
                "key": "allocation_value",
                "value": null
            },
            "attributes4": {
                "key": "fund_code",
                "value": null
            },
            "fund_name": null,
            "investment_strategy_name": null
        }
    ],
    "Investment_Strategies": [
        {
            "attributes1": {
                "key": "policy_no",
                "value": ""
            },
            "attributes2": {
                "key": "allocation_mode",
                "value": "percentage"
            },
            "attributes3": {
                "key": "allocation_value",
                "value": null
            },
            "attributes4": {
                "key": "fund_code",
                "value": null
            },
            "fund_name": null,
            "investment_strategy_name": null
        }
    ],
    "Roles": [
        {
            "attributes1": {
                "key": "policy_no",
                "value": ""
            },
            "attributes2": {
                "key": "beneficiary_percentage",
                "value": "100.00"
            },
            "attributes3": {
                "key": "relationship",
                "value": "SPOUSE"
            },
            "role_name": "Nominee",
            "attributes4": {
                "key": "age",
                "value": "NaN"
            },
            "attributes5": {
                "key": "party_name",
                "value": ""
            },
            "attributes6": {
                "key": "gender",
                "value": "F"
            },
            "attributes7": {
                "key": "date_of_birth",
                "value": "1953-03-20"
            },
            "attributes8": {
                "key": "email",
                "value": null
            },
            "attributes9": {
                "key": "registred_mobile",
                "value": null
            }
        },
        {
            "attributes1": {
                "key": "policy_no",
                "value": ""
            },
            "attributes2": {
                "key": "beneficiary_percentage",
                "value": null
            },
            "attributes3": {
                "key": "relationship",
                "value": null
            },
            "role_name": "Insured",
            "attributes4": {
                "key": "age",
                "value": 63
            },
            "attributes5": {
                "key": "party_name",
                "value": ""
            },
            "attributes6": {
                "key": "gender",
                "value": "M"
            },
            "attributes7": {
                "key": "date_of_birth",
                "value": "1950-06-20"
            },
            "attributes8": {
                "key": "email",
                "value": null
            },
            "attributes9": {
                "key": "registred_mobile",
                "value": null
            }
        },
        {
            "attributes1": {
                "key": "policy_no",
                "value": ""
            },
            "attributes2": {
                "key": "beneficiary_percentage",
                "value": null
            },
            "attributes3": {
                "key": "relationship",
                "value": null
            },
            "role_name": "Owner",
            "attributes4": {
                "key": "age",
                "value": 63
            },
            "attributes5": {
                "key": "party_name",
                "value": ""
            },
            "attributes6": {
                "key": "gender",
                "value": "M"
            },
            "attributes7": {
                "key": "date_of_birth",
                "value": "1950-06-20"
            },
            "attributes8": {
                "key": "email",
                "value": ""
            },
            "attributes9": {
                "key": "registred_mobile",
                "value": ""
            }
        }
    ],
    "Benefits": [
        {
            "attributes1": {
                "key": "policy_no",
                "value": ""
            },
            "benefit_name": "Miii",
            "attributes2": {
                "key": "benefit_term",
                "value": "7"
            },
            "attributes3": {
                "key": "modal_premium",
                "value": null
            },
            "attributes4": {
                "key": "sum_assured",
                "value": 192000
            },
            "attributes5": {
                "key": "cash_value",
                "value": 48576
            },
            "attributes6": {
                "key": "expiry_date",
                "value": null
            },
            "attributes7": {
                "key": "status",
                "value": "Death"
            },
            "attributes8": {
                "key": "benefit_code",
                "value": ""
            },
            "attributes9": {
                "key": "benefit_type",
                "value": "base"
            }
        }
    ]
}
'''
whatsinthename
  • 1,828
  • 20
  • 59
  • Can you clarify "Also, how could I format the records of the Roles array with some kind of indent in it?"? – Stev Mar 01 '22 at 11:15

2 Answers2

2

When you have a strong model in your data, and you want to process/validate the data you receive, its better to be done in a object oriented fashion. If you can consider your data value as an object with all logic written inside for validating or correcting, it becomes simple.

See an example approach below:

(Assumed your null as None)

import json

# Your data object..
data = '''{
        "id": 1,
        "name": "Hey",
        "category": null,
        "salary": 1222,
        "Roles": [
            {
                "attributes1": {
                "key": "sum_assured",
                "value": 192000
            },
                "attributes2": {
                    "key": "cash_value",
                    "value": 48576
                },
                "attributes3": {
                    "key": "expiry_date",
                    "value": null
                }
            }
        ]
    
    }'''


# Create  simple object models to represent the data.
class Role:
    def __init__(self, key, value):
        self.key = key
        self.value = value or ""  # fallback to empty string


class Policy:
    def __init__(self, id, name, category, salary, Roles):
        # write all your pre/post processing here
        self.id = id
        self.name = name
        self.category = category or ""
        self.salary = format(salary, '0.2f') if salary else ""
        self.Roles = [Role(**role) for role in Roles[0].values()] if Roles else []


# a json serialization helper function
def serial_default(obj):
    if isinstance(obj, (Policy, Role)):
        return obj.__dict__
    else:
        return obj


# You just create the object with your data. Now all data is in the object.
data = json.loads(data)
p = Policy(**data)

# Serialize the object to json in the proper format
print(json.dumps(p, default=serial_default))

The print generates a valid version of your object as JSON like

{"id": 1, "name": "Hey", "category": "", "salary": "1222.00", "Roles": [{"key": "sum_assured", "value": 192000}, {"key": "cash_value", "value": 48576}, {"key": "expiry_date", "value": ""}]}

This is just an example to the approach. You can modify or improvise a similar approach and make things simpler.

Kris
  • 8,680
  • 4
  • 39
  • 67
  • TYSM for your time @Kris. So basically I have my raw data enclosed under ' ' ' ....' ' ' and when I tried running this way it gave me an error: `p = Policy(**json_data) TypeError: type object argument after ** must be a mapping, not str` – whatsinthename Mar 01 '22 at 13:59
  • then load your raw data with `json.loads()` and then pass that object! – Kris Mar 01 '22 at 14:00
  • I still get the same error, Kris. Could you please check my updated actual dataset? – whatsinthename Mar 01 '22 at 14:16
  • Check the updated answer. I hope this is what you need. – Kris Mar 01 '22 at 14:21
  • Still the same error. Could you please check my updated dataset? and let me know what to do ? – whatsinthename Mar 01 '22 at 14:24
  • Your question has one direction and dataset another. I cannot see all the fields you mentioned in your original question. Where is the ID, name, category etc etc. I can see only Roles here, that too with a different format. – Kris Mar 01 '22 at 14:28
  • My bad, Kris. I thought it is the same one. Could you please help me with my actual dataset? Or will the code work if I just pass the variables for each node by creating separate classes? – whatsinthename Mar 01 '22 at 14:37
  • I think, I have communicated the idea on how to approach the problem. A complete working solution is not something I can work on now. Give it a try! – Kris Mar 01 '22 at 14:39
  • Kris, Thank you for your approach. Reminds me of my old OOPs days. However, your output is incorrect than my expected output, or else would have tried your way. Thanks for your time though :) – whatsinthename Mar 01 '22 at 15:22
1

Firstly i have to agree with Kris answer, this kind of conversion looks now a bit ugly and with every added special case Object-Oriented-style makes more sense.

A few points:

  1. null, None, ""

It seems that null won't be seen as None, so your not getting into the if clause. Check JSON to Dict for that

How can JSON data with null value be converted to a dictionary

  1. Salary

According to your question you would expect "salary": "1222.00" correct? If so you are currently overwriting your previously formatted number with

if isinstance(j, str):
   new_dict[i] = j

Just change it to elif so you wont overwrite your conversion.

  1. Roles

The objects within the array of Roles are currently not iterated over. Hence the number are not being formatted. For that you would need another loop on the Roles array.

Following a recursion attempt to cover multiple nested collections.

import json
from collections import abc

dataRaw = '''
{
    "id": 1,
    "name": "Hey",
    "category": null,
    "salary": 1222,
    "Roles": [
        {
            "attributes1": {
            "key": "sum_assured",
            "value": 192000
        },
            "attributes2": {
                "key": "cash_value",
                "value": 48576
            },
            "attributes3": {
                "key": "expiry_date",
                "value": null
            }
        }
    ]

}
'''

data = json.loads(dataRaw)
def processNested(nested):
    # Check for list
    if (isinstance(nested, list)):
        # for each element in list processNested
        for element in nested:
            element = processNested(element)
    else:
        for key, value in nested.items():
            # If nested Collection
            if isinstance(value, abc.Mapping):
                # Allow special "treatment" for {"key": "...", "value": ... } like objects 
                if "key" in value and "value" in value:
                    if ((value["key"] == "cash_value" or value["key"] == "sum_assured") and value["value"] != "" and
                            value["value"] != None):
                        nested[key]["value"] = str("%0.2f" % int(value["value"]))
                    else:
                        nested[key] = processNested(value)
            else:
                if (key == "value" and value != None and value != ""):
                    nested[key] = str(nested["value"])
    return nested

new_dict = data
for i, j in new_dict.items():
    if (i == "salary"):
        new_dict[i] = "%0.2f" % j
    elif isinstance(j, str):
        new_dict[i] = j
    elif j is None:
        new_dict[i] = ""
    elif i == "Roles":
        new_dict[i] = processNested(j)
    else:
        new_dict[i] = str(j)

print(new_dict)
Stev
  • 416
  • 4
  • 11
  • Still, it doesn't work for the value attribute. How can I access the nested JSON attributes is the question here – whatsinthename Mar 01 '22 at 12:09
  • Basically, I just want to access all the nested keys and values from a dictionary – whatsinthename Mar 01 '22 at 12:26
  • I updated my answer with a function to process the nested part. Tell me if it fits your task and if it need further clarification. – Stev Mar 01 '22 at 13:23
  • Please find my updated actual dataset. – whatsinthename Mar 01 '22 at 14:24
  • Your code works fine. Thanks a lot. Just one question if I just want the amount of `cash_value` to be converted to float, how can I do it? – whatsinthename Mar 01 '22 at 15:03
  • See Updated answer, but pls consider Kris answer aswell, as it becomes more unreadable with further additions. If you consider your original question answered, please mark the given answers as helpful or solution. Thx – Stev Mar 01 '22 at 15:33
  • Appreciate your efforts Stev. Yeah, Kris's answer is great as well but it's an exhaustive process considering my dataset attributes. I want to modify your answer but your answer's edit queue is full :) Post that I can accept your answer :) – whatsinthename Mar 01 '22 at 15:57
  • Also, I need a walkthrough of your code. Why are your using indexing? – whatsinthename Mar 01 '22 at 16:51
  • Well, indexing was a leftover, thought i needed it, but wasn't needed. for i in list will do fine. I have updated my answer & added a few comments – Stev Mar 02 '22 at 10:17