0

I am struggling with the conversion of JSON to DataFrame. I get a JSON as a result of API request.

I tried pd.read_json, and pd.json_normalize and nothing helps. The result is always a DataFrame with (1,1) shape. I went through similar requests but looks like json_normalize helps everyone except me.

This is how my code looks today:

import requests
import json
import pandas as pd

response = requests.get("https://******************************/parameters/100135?from=1h",
                        auth = ('*******', '*******'), verify=False)
t = response.text

stud_obj = json.loads(t)
m = pd.json_normalize(stud_obj)

Here is an example of the JSON:

{
 "DataList" : [
  {
   "Parameter" : 100135,
   "Parameter Label" : "TC3; GR53; Irrigation Zones GR53; Row 2 Level 3 (6); Duration Of Waterings {",
   "DataSet" : [
    {
     "Data" : "0:00:00",
     "Time" : "Fri Jun 25 00:00:01 2021"
    },
    {
     "Data" : "0:00:01",
     "Time" : "Fri Jun 25 07:51:33 2021"
    },
    {
     "Data" : "0:00:02",
     "Time" : "Fri Jun 25 07:51:34 2021"
    },
    {
     "Data" : "0:00:03",
     "Time" : "Fri Jun 25 07:51:35 2021"
    }
   ]
  }
 ]
}

Here is a DataFrame as a result:

print(m)
                                                DataList
    0  [{'Parameter': 100135, 'Parameter Label': 'TC3...
m.shape
    (1, 1)
Oleg Kazanskyi
  • 196
  • 2
  • 13

3 Answers3

2

There may be efficient solution but untill you can use:

out=pd.DataFrame(pd.Series(data).explode().tolist()).explode('DataSet',ignore_index=True)
#Here data is your json

Finally:

out=out.join(pd.DataFrame(out.pop('DataSet').tolist()))

output of out:

    Parameter   Parameter Label                                 Data     Time
0   100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level ...   0:00:00     Fri Jun 25 00:00:01 2021
1   100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level ...   0:00:01     Fri Jun 25 07:51:33 2021
2   100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level ...   0:00:02     Fri Jun 25 07:51:34 2021
3   100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level ...   0:00:03     Fri Jun 25 07:51:35 2021
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
2

You can use pd.json_normalize() + pd.DataFrame(), as follows:

Using your loaded json stud_obj:

df_out = pd.json_normalize(stud_obj['DataList']).explode('DataSet').reset_index(drop=True)

df_out = pd.concat([df_out.drop('DataSet', axis=1), pd.DataFrame(df_out['DataSet'].to_list())], axis=1)

Result:

print(df_out)


   Parameter                                                               Parameter Label     Data                      Time
0     100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level 3 (6); Duration Of Waterings {  0:00:00  Fri Jun 25 00:00:01 2021
1     100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level 3 (6); Duration Of Waterings {  0:00:01  Fri Jun 25 07:51:33 2021
2     100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level 3 (6); Duration Of Waterings {  0:00:02  Fri Jun 25 07:51:34 2021
3     100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level 3 (6); Duration Of Waterings {  0:00:03  Fri Jun 25 07:51:35 2021
SeaBean
  • 22,547
  • 3
  • 13
  • 25
0

pd.json_normalize() can do the job by itself:

d = { ... your JSON object ... }
pd.json_normalize(d,['DataList','DataSet'],[['DataList','Parameter'],['DataList','Parameter Label']])

     Data             Time            DataList.Parameter     DataList.Parameter Label
0   0:00:00     Fri Jun 25 00:00:01 2021    100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level ...
1   0:00:01     Fri Jun 25 07:51:33 2021    100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level ...
2   0:00:02     Fri Jun 25 07:51:34 2021    100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level ...
3   0:00:03     Fri Jun 25 07:51:35 2021    100135  TC3; GR53; Irrigation Zones GR53; Row 2 Level ...
jch
  • 3,600
  • 1
  • 15
  • 17