4

So here's my simple example (the json field in my actual dataset is very nested so I'm unpacking things one level at a time). I need to keep certain columns on the dataset post json_normalize().

https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

Start: Start

Expected (Excel mockup): Expected

Actual: Actual

import json

d = {'report_id': [100, 101, 102], 'start_date': ["2021-03-12", "2021-04-22", "2021-05-02"], 
     'report_json': ['{"name":"John", "age":30, "disease":"A-Pox"}', '{"name":"Mary", "age":22, "disease":"B-Pox"}', '{"name":"Karen", "age":42, "disease":"C-Pox"}']}

df = pd.DataFrame(data=d)
display(df)

df = pd.json_normalize(df['report_json'].apply(json.loads), max_level=0, meta=['report_id', 'start_date'])
display(df)

Looking at the documentation on json_normalize(), I think the meta parameter is what I need to keep the report_id and start_date but it doesn't seem to be working as the expected fields to keep are not appearing on the final dataset.

Does anyone have advice? Thank you.

Anonymous
  • 453
  • 1
  • 6
  • 14
  • Yes, the accepted answer from the link you sent was out of date but the comments were very useful! The solution to use: df = df.join(pd.json_normalize(...)) worked for me! I am still very curious why "meta= " isn't working me though. Anyhow, thank you! – Anonymous May 21 '22 at 18:34
  • 1
    good use of images to convey your [mcve] btw. – Umar.H May 21 '22 at 18:39
  • 1
    You can look at [pandas.json_normalize](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html), `meta` usually works with `record_path` argument, it's used to select keys not in `record_path`. In your example you are using `pd.json_normalize(df['report_json'].apply(json.loads))`, `df['report_json'].apply(json.loads)` doesn't contain any key like `report_id`. And `pd.json_normalize` is usually used for dict rather than a DataFrame. – Ynjxsjmh May 21 '22 at 18:40

1 Answers1

6

as you're dealing with a pretty simple json along a structured index you can just normalize your frame then make use of .join to join along your axis.

from ast import literal_eval


df.join(
      pd.json_normalize(df['report_json'].map(literal_eval))
 ).drop('report_json',axis=1)


   report_id  start_date   name  age disease
0        100  2021-03-12   John   30   A-Pox
1        101  2021-04-22   Mary   22   B-Pox
2        102  2021-05-02  Karen   42   C-Pox
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • Hi, I was able to make it work with the simple join and drop, is the .map(literal_eval) necessary? – Anonymous May 21 '22 at 18:39
  • 1
    it depends if you're dealing with a stringified json object or a string @Anonymous - you can also use `json.loads` which is probably better. from the docs `data` expects `dict or list of dicts Unserialized JSON objects.` – Umar.H May 21 '22 at 18:41
  • how would you include json.loads() i – QuentinJS Jan 15 '23 at 06:12