How do I json_normalize() a specific field within a df and keep the other columns?

Question

So here's my simple example (the json field in my actual dataset is very nested so I'm unpacking things one level at a time). I need to keep certain columns on the dataset post json_normalize().

https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

Start:

Expected (Excel mockup):

Actual:

import json

d = {'report_id': [100, 101, 102], 'start_date': ["2021-03-12", "2021-04-22", "2021-05-02"], 
     'report_json': ['{"name":"John", "age":30, "disease":"A-Pox"}', '{"name":"Mary", "age":22, "disease":"B-Pox"}', '{"name":"Karen", "age":42, "disease":"C-Pox"}']}

df = pd.DataFrame(data=d)
display(df)

df = pd.json_normalize(df['report_json'].apply(json.loads), max_level=0, meta=['report_id', 'start_date'])
display(df)

Looking at the documentation on json_normalize(), I think the meta parameter is what I need to keep the report_id and start_date but it doesn't seem to be working as the expected fields to keep are not appearing on the final dataset.

Does anyone have advice? Thank you.

Yes, the accepted answer from the link you sent was out of date but the comments were very useful! The solution to use: df = df.join(pd.json_normalize(...)) worked for me! I am still very curious why "meta= " isn't working me though. Anyhow, thank you! — Anonymous, May 21 '22 at 18:34
You can look at [pandas.json_normalize](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html), `meta` usually works with `record_path` argument, it's used to select keys not in `record_path`. In your example you are using `pd.json_normalize(df['report_json'].apply(json.loads))`, `df['report_json'].apply(json.loads)` doesn't contain any key like `report_id`. And `pd.json_normalize` is usually used for dict rather than a DataFrame. — Ynjxsjmh, May 21 '22 at 18:40

score 6 · Accepted Answer · answered May 21 '22 at 18:34

6

as you're dealing with a pretty simple json along a structured index you can just normalize your frame then make use of .join to join along your axis.

from ast import literal_eval


df.join(
      pd.json_normalize(df['report_json'].map(literal_eval))
 ).drop('report_json',axis=1)


   report_id  start_date   name  age disease
0        100  2021-03-12   John   30   A-Pox
1        101  2021-04-22   Mary   22   B-Pox
2        102  2021-05-02  Karen   42   C-Pox

answered May 21 '22 at 18:34

Umar.H

22,559
7
39
74

Hi, I was able to make it work with the simple join and drop, is the .map(literal_eval) necessary? – Anonymous May 21 '22 at 18:39
1

it depends if you're dealing with a stringified json object or a string @Anonymous - you can also use `json.loads` which is probably better. from the docs `data` expects `dict or list of dicts Unserialized JSON objects.` – Umar.H May 21 '22 at 18:41
how would you include json.loads() i – QuentinJS Jan 15 '23 at 06:12

How do I json_normalize() a specific field within a df and keep the other columns?

1 Answers1

Linked