So I wanted to flatten nested json data in this pandas data frame as additional columns. I am using requests to get json data from an api and create a pandas data frame.
import requests import pandas as pd import json
url = 'https://api.openaq.org/v2/locations?coordinates=38.7437396,-9.2302436&radius=5000'
api_request = requests.get(url)
api_content = api_request.content
api_json = json.loads(api_content)
df = pd.json_normalize(api_json)
df
This outputs the following:
results meta.name meta.license meta.website meta.page meta.limit meta.found
0 [{'id': 8209, 'city': 'Lisboa', 'name': 'PT031... openaq-api CC BY 4.0d api.openaq.org 1 100 4
so, I did
df = pd.json_normalize(api_json, record_path=["results"])
Which gives a nicer result:
id city name entity country sources isMobile isAnalysis parameters sensorType lastUpdated firstUpdated measurements bounds manufacturers coordinates.latitude coordinates.longitude
0 8209 Lisboa PT03100 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25414, 'unit': 'µg/m³', 'count': 10005... reference grade 2022-08-27T16:00:00+00:00 2017-09-22T11:00:00+00:00 246216 None None 38.748056 -9.202500
1 8211 Lisboa PT03084 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25397, 'unit': 'µg/m³', 'count': 93281... reference grade 2022-08-27T16:00:00+00:00 2017-09-22T10:00:00+00:00 239733 None None 38.754167 -9.230833
2 7199 Lisboa PT03087 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 24800, 'unit': 'µg/m³', 'count': 97702... reference grade 2022-08-27T16:00:00+00:00 2017-09-22T13:00:00+00:00 235512 None None 38.705000 -9.210278
3 7767 Lisboa PT03082 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25393, 'unit': 'µg/m³', 'count': 83437... reference grade 2022-08-27T16:00:00+00:00 2018-02-27T01:00:00+00:00 304693 None None 38.738889 -9.207500
However, the 'sources' and 'parameter' columns are still in json format. How can I unpack these into new columns?
Also, can I for instance just pull the 'url' key out of the 'sources' json data and make it its own column?