Many questions and answers explain how to load a json file from a URL into a pandas data frame, using low level methods for example JSON to pandas DataFrame . I managed to make it work but if possible, I would prefer to use the simple pandas.io.json
method to load the data frame directly from a url. The problem is that pandas.io.json.read_json
returns a ValueError, which other answers such as ValueError: Expected object or value when reading json as pandas dataframe say can be fixed by setting the orient
argument, but what I would need is a different argument that would make it possible to specify that the data frame is one level down in the nested json at the "results" level.
# Sample content of the json file
# {
# "more": false,
# "results": [
# {
# "id": "all",
# "text": "All"
# },
# {
# "ID":"4",
# "text": "Afghanistan"
# },
# {
# "id": "975",
# "text": "ASEAN"
# }
# ]
# }
import urllib.request
import json
import pandas
url_reporter = "https://comtrade.un.org/Data/cache/reporterAreas.json"
url_partner = "https://comtrade.un.org/Data/cache/partnerAreas.json"
# Failed attempt with pandas.io
pandas.io.json.read_json(url_reporter)
# ~/.local/lib/python3.7/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
# 1138 if orient == "columns":
# 1139 self.obj = DataFrame(
# -> 1140 loads(json, precise_float=self.precise_float), dtype=None
# 1141 )
# 1142 elif orient == "split":
#
# ValueError: Expected object or value
pandas.io.json.read_json(url_partner)
# Returns a fattened data frame with json as the cell content in a "results" column
# Successful attempt using lower level packages
req = urllib.request.Request(url=url_reporter)
with urllib.request.urlopen(req) as response:
json_content = json.load(response)
df = pandas.json_normalize(json_content['results'])
In [34]: df
Out[34]:
id text
0 all All
1 0 World
2 4 Afghanistan
3 472 Africa CAMEU region, nes
4 8 Albania
.. ... ...
289 879 Western Asia, nes
290 732 Western Sahara
291 887 Yemen
292 894 Zambia
293 716 Zimbabwe
[294 rows x 2 columns]