1

Need to extract value from a json string stored in pandas column and assign it to a column with a conditional apply to rows with null values only.

df = pd.DataFrame({'col1': [06010, np.nan, 06020, np.nan],
                   'json_col': [{'Id': '060',
                                 'Date': '20210908',
                                 'value': {'Id': '060',
                                           'Code': '06037'}
                                 },
                                 {'Id': '061',
                                 'Date': '20210908',
                                 'value': {'Id': '060',
                                           'Code': '06038'}
                                 },
                                 {'Id': '062',
                                 'Date': '20210908',
                                 'value': {'Id': '060',
                                           'Code': '06039'}
                                 },
                                 {'Id': '063',
                                 'Date': '20210908',
                                 'value': {'Id': '060',
                                           'Code': '06040'}
                                 }],
                })
                         

# Check for null condition and extract Code from json string

df['Col1'] = df[df['Col1'].isnull()].apply(lambda x : [x['json_col'][i]['value']['Code'] for i in x])

Expected result:

Col1

06010
06038
06020
06040
kms
  • 1,810
  • 1
  • 41
  • 92
  • for performance reasons, it is better if you preprocess the json data into flat form before creating the dataframe. – sammywemmy Sep 12 '21 at 04:48

2 Answers2

1

To extract field from a dictionary column, you can use .str accessor. For instance, to extract json_col -> value -> code you can use df.json_col.str['value'].str['Code']. And then use fillna to replace nan in col1:

df.col1.fillna(df.json_col.str['value'].str['Code'])

0    06010
1    06038
2    06020
3    06040
Name: col1, dtype: object
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • For some reason, this isn't working for me. Doesn't throw any errors, but it isn't assign values either. – kms Sep 13 '21 at 05:54
  • You can just assign it to a column with: `df['col1'] = df.col1.fillna(df.json_col.str['value'].str['Code'])` – Psidom Sep 13 '21 at 06:06
0

Try with this:

>>> df['col1'].fillna(df['json_col'].map(lambda x: x['value']['Code']))
0    06010
1    06038
2    06020
3    06040
Name: col1, dtype: object
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114