I have project.json
file, which contains data like this :
{"student_id": "ST0001", "project": [{"subject_id": "S003", "date_of_submission": "2021-05-23 20:03:05"}, {"subject_id": "S004", "date_of_submission": "2021-05-24 21:03:05"}, {"subject_id": "S005", "date_of_submission": "2021-05-30 05:09:30"}], "project_year": "Second"}
{"student_id": "ST0002", "project": [{"subject_id": "S003", "date_of_submission": "2021-06-02 15:05:05"}, {"subject_id": "S007", "date_of_submission": "2021-04-28 21:03:01"}], "project_year": "Second"}
{"student_id": "ST0002", "project": [{"subject_id": "S0018", "date_of_submission": "2020-06-03 08:15:21"}], "project_year": "First"}
I need to extract the nested subject_id
and date_of_submission
into a separate column like :
student_id | subject_id | date_of_submission | project_year |
---|---|---|---|
ST0001 | S003 | 23/05/2021 20:03 | Second |
ST0001 | S004 | 24/05/2021 21:03 | Second |
ST0001 | S005 | 30/05/2021 05:09 | Second |
ST0002 | S003 | 02/06/2021 15:05 | Second |
ST0002 | S007 | 28/04/2021 21:03 | Second |
ST0002 | S0018 | 03/06/2020 08:15 | First |
I think we can use json_normalize
to extract one level up, can someone help me complete this;
import pandas as pd
df=pd.read_json('project.json', lines=True)
df = pd.DataFrame(df).explode('project')