Parsing Column in Pandas DataFrame with one column that contains a nested JSON string

Question

I have a DataFrame in Python that looks like the one below. Theres a column (referred to below as 'json') that has a large nested JSON string in it. How can I parse it out so I can have a nice clean data frame with many columns. just particularly need the cost and monthly amounts for each ID in separate columns. Ideally i'd have a table that looks like:

id, name, cost, monthly

10001, frank, 15.85, 15.85

10002, mary, 30.86, 23.03

    d = {'id': ['10001', '10002'], 'json': ['{"costs":[{"cost":15.85}],"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":15.85,"rating":"A++","waiverOfPremium":1.74,"carrier":"companyabc","face":250000,"term":20,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}],"agentSuggestion":{"costs":[{"cost":15.85}],"options":{"product":"XYZt","gender":"male","healthClass":"0","smoker":"false","age":32,"term":"20","faceAmount":250000,"waiverOfPremiumAmount":1.74,"includeWaiverOfPremium":false,"state":"CT"},"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":15.85,"rating":"A++","waiverOfPremium":1.74,"carrier":"companyabc","face":250000,"term":20,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}]}}', '{"costs":[{"cost":30.86}],"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":23.03,"rating":"A++","waiverOfPremium":7.83,"carrier":"companyabc","face":1000000,"term":10,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}],"agentSuggestion":{"costs":[{"cost":30.86}],"options":{"product":"XYZt","gender":"female","healthClass":"0","smoker":"false","age":35,"term":10,"faceAmount":1000000,"waiverOfPremiumAmount":7.83,"includeWaiverOfPremium":true,"state":"GA"},"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":23.03,"rating":"A++","waiverOfPremium":7.83,"carrier":"companyabc","face":1000000,"term":10,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}]}}'], 'name':['frank','mary']}

   test = pd.DataFrame(data=d)

score 4 · Accepted Answer · answered Apr 16 '18 at 15:56

There you go. You have 2 different costs in your JSON (cost and agentSuggestion cost), so both are added here:

import json
test = pd.DataFrame(d, columns = ['id', 'json', 'name'])
test['cost'] = test['json'].transform(lambda x: json.loads(x)['costs'][0]['cost'])
test['agent_suggestion_cost'] = test['json']\
    .transform(lambda x: json.loads(x)['agentSuggestion']["costs"][0]['cost'])
print(test)

You can follow similar logic for parsing other fields like monthly. For more reference see e.g.here Look around for a JSON prettifier (e.g. JSTool with Notepad++) to view your JSON's structure, this will help to understand its structure.

Kindly accept the answer if you found it useful.

Thanks! This exactly what I was looking for, it worked beautifully. Thanks for the help! — PokerFace, Apr 16 '18 at 17:31

score 0 · Answer 2 · answered Apr 16 '18 at 15:22

Pandas offers a couple of utilities for dealing with json files. The ones that make sense for your case are pd.read_json and pd.io.json_normalize. They do however expect input in a different json format than yours.

orient : string,

Indication of expected JSON string format. Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible orients is:

'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
'records' : list like [{column -> value}, ... , {column -> value}]
'index' : dict like {index -> {column -> value}}
'columns' : dict like {column -> {index -> value}}
'values' : just the values array
The allowed and default values depend on the value of the typ parameter.

when typ == 'series',
allowed orients are {'split','records','index'}
default is 'index'
The Series index must be unique for orient 'index'.
when typ == 'frame',
allowed orients are {'split','records','index', 'columns','values'}
default is 'columns'
The DataFrame index must be unique for orients 'index' and 'columns'.
The DataFrame columns must be unique for orients 'index', 'columns', and 'records'.

Parsing Column in Pandas DataFrame with one column that contains a nested JSON string

2 Answers2