1

I have read data from particular API using the following Python lines

import requests 
import json
# read all Measurement from one sensor for several days.
r = requests.get('https://wastemanagement.post-iot.lu/measurement/measurements?source=83512 pageSize=1000000000&dateFrom=2019-10-26&dateTo=2019-10-28', auth=('xxxxx', 'xxxxx'))
text=r.text  # data is stored in json format  

This is an example of the output but the actual output contains several lines: Actual output can be downloaded from here actual output

enter code here
Valid JSON (RFC 8259)
Formatted JSON Data
{
"next":"https://wastemanagement.post-iot.lu/measurement/measurements?dateTo=2019-10-28&   
 pageSize=2000&source=83512&dateFrom=2019-10-26&currentPage=2",
"self":"https://wastemanagement.post-iot.lu/measurement/measurements?dateTo=2019-10-28& 
 pageSize=2000&source=83512&dateFrom=2019-10-26&currentPage=1",
"statistics":{
"totalPages":null,
"currentPage":1,
"pageSize":2000
},
"measurements":[
{
     "self":"https://wastemanagement.post-iot.lu/measurement/measurements/108451",
     "time":"2019-10-26T00:00:06.494Z",
     "id":"108451",
     "source":{
        "self":"https://wastemanagement.post-iot.lu/inventory/managedObjects/83512",
        "id":"83512"
     },
     "type":"c8y_Measurement",
     "battery":{
        "percent":{
           "unit":"%",
           "value":98
        }
     }
  },
  {
     "self":"https://wastemanagement.post-iot.lu/measurement/measurements/108452",
     "time":"2019-10-26T00:00:06.538Z",
     "id":"108452",
     "source":{
        "self":"https://wastemanagement.post-iot.lu/inventory/managedObjects/83512",
        "id":"83512"
     },
     "type":"TemperatureMeasurement",
     "c8y_TemperatureMeasurement":{
        "T":{
           "unit":"C",
           "value":23
        }
     }
  },
  {
     "self":"https://wastemanagement.post-iot.lu/measurement/measurements/108537",
     "time":"2019-10-26T00:00:06.577Z",
     "id":"108537",
     "source":{
        "self":"https://wastemanagement.post-iot.lu/inventory/managedObjects/83512",
        "id":"83512"
     },
     "type":"c8y_Measurement",
     "c8y_DistanceMeasurement":{
        "distance":{
           "unit":"cm",
           "value":21
        }
     }
  },

the attached image show the output in the (text) enter image description here

I have tried to convert the JSON format (Which is stored in text) to pandas DataFrame using the following code but doesnt work

data = json.loads(text) 

I would like to convert JSON format to pandas DataFrame, this Dataframe should contain five columns (id, battery,T or Temperature, time, and distance) and several rows

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Wisam hasan
  • 187
  • 1
  • 3
  • 13

2 Answers2

3

You can try this, it works well

// importing required libraries
import pandas as pd
import json
import requests

// hosted your json response as a url response 
URL = 'https://my-json-server.typicode.com/abhikumar22/JsonServer/data'

// getting requests from the server
req = requests.get(URL )
text_data= req.text
json_dict= json.loads(text_data)

// converting the json dictionary to a dataframe
df = pd.DataFrame.from_dict(json_dict["measurements"])
cols_to_keep = ['id','battery','c8y_TemperatureMeasurement','time','c8y_DistanceMeasurement']
df_final = df[cols_to_keep]
df_final = df_final.rename(columns={'c8y_TemperatureMeasurement': 'Temperature Or T','c8y_DistanceMeasurement':'Distance'})
print(df_final)

result of the code you can see find here

You will get the desired result which you want rest some column values you can modify further for getting the column values.

abhikumar22
  • 1,764
  • 2
  • 11
  • 24
  • 1
    thanks for your code but I didnt get the desired output because (json_dict= json.loads(text_data)) doesnt give the correct output. For example, I am looking for managedObjects (i.e., 83512) while I have got measurements id (i.e., 108451), as well as there are another parameters are missing (Temperature and distance). – Wisam hasan Nov 15 '19 at 09:55
  • oh.. yes, you can just edit the **source** column and get desired id's from there, just keep the **source** and neglect the **id** column. Then apply lambda function on the whole column so that it will transform the whole column. Furthermore, `json_dict= json.loads(text_data)` works well in my end, please check once. Try to run the code in jupyter notebook, i have checked it and its working fine – abhikumar22 Nov 15 '19 at 10:10
  • abhikumar22, thank you so much for your support, really approciate it (i got what you mean and correctly modified). Kindly, could you help me to format the output, as the current one contains severl Nan as well as the id, battery, Temperature, and distance columns contains text and i am looking for the number only. Please see the desired output. thanks again – Wisam hasan Nov 15 '19 at 10:46
1

This should work. Btw, I'm not sure which time specifically you need in the dataframe. Therefore, I included all of them in the solution (if you're not certain of the order in which the measurements come)

import pandas as pd
import numpy as np
import json
from collections import OrderedDict
json_str = {
"next":"https://wastemanagement.post-iot.lu/measurement/measurements?dateTo=2019-10-28&pageSize=2000&source=83512&dateFrom=2019-10-26&currentPage=2",
"self":"https://wastemanagement.post-iot.lu/measurement/measurements?dateTo=2019-10-28&pageSize=2000&source=83512&dateFrom=2019-10-26&currentPage=1",
"statistics":{
"totalPages":"null",
"currentPage":1,
"pageSize":2000
},
"measurements":[
{
     "self":"https://wastemanagement.post-iot.lu/measurement/measurements/108451",
     "time":"2019-10-26T00:00:06.494Z",
     "id":"108451",
     "source":{
        "self":"https://wastemanagement.post-iot.lu/inventory/managedObjects/83512",
        "id":"83512"
     },
     "type":"c8y_Measurement",
     "battery":{
        "percent":{
           "unit":"%",
           "value":98
        }
     }
  },
  {
     "self":"https://wastemanagement.post-iot.lu/measurement/measurements/108452",
     "time":"2019-10-26T00:00:06.538Z",
     "id":"108452",
     "source":{
        "self":"https://wastemanagement.post-iot.lu/inventory/managedObjects/83512",
        "id":"83512"
     },
     "type":"TemperatureMeasurement",
     "c8y_TemperatureMeasurement":{
        "T":{
           "unit":"C",
           "value":23
        }
     }
  },
  {
     "self":"https://wastemanagement.post-iot.lu/measurement/measurements/108537",
     "time":"2019-10-26T00:00:06.577Z",
     "id":"108537",
     "source":{
        "self":"https://wastemanagement.post-iot.lu/inventory/managedObjects/83512",
        "id":"83512"
     },
     "type":"c8y_Measurement",
     "c8y_DistanceMeasurement":{
        "distance":{
           "unit":"cm",
           "value":21
        }
     }
   }]
     }


#json_str2 = json.dumps(dct)
df = pd.io.json.json_normalize(json_str)
df2 = pd.io.json.json_normalize(
    OrderedDict([(str(i), v) for i, v in enumerate(df["measurements"].tolist()[0])]))

# If you are certain that the list always comes in that order
df = pd.concat([df, df2], axis=1)
df[["0.source.id", "2.time","0.battery.percent.value", "1.c8y_TemperatureMeasurement.T.value", "2.c8y_DistanceMeasurement.distance.value"]]

# If you are uncertain of the order
cols = ['0.source.id'] + \
[c for c in df.columns if ('time' in c or 'emperatureMeasurement.T.value' in c or 'DistanceMeasurement.distance.unit' in c or 'battery.percent.value' in c)]
df[cols].head()
thushv89
  • 10,865
  • 1
  • 26
  • 39
  • Dear @thushv89, thank you so much for your support, really approciate it. but I have several input in the JSON file I have around 5000 rows . at the moment your code is perfect for single entry (one row only) but I have data for several rows (e.g., assume the table has 5000 entries ). I have attached an example of what I have got from the attached code but the actual output I have several input for id, battery,T or Temperature, time, and distance. I have attached the the output of runing this line text=r.text (i.e., JSON format). Please download this file https://filebin.net/c0cxryxe6rjspqxf – Wisam hasan Nov 15 '19 at 10:13