I am trying to create a pandas dataframe out of a nested json. For some reason, I seem to be unable to address the third level.
My json looks something like this:
"numberOfResults": 376,
"results": [
{
"name": "single",
"docs": [
{
"id": "RAKDI342342",
"type": "Culture",
"category": "Culture",
"media": "unknown",
"label": "exampellabel",
"title": "testtitle and titletest",
"subtitle": "Archive"
]
},
{
"id": "GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER",
"type": "Culture",
"category": "Culture",
"media": "image",
"label": "more label als example",
"title": "test the second title",
"subtitle": "picture"
and so on.
Within the "docs"-part are all the actual results, starting with "id". Once all the information is there, the next block starting with "id" simply follows.
Now I am trying to create a table with the keys id, label and title (for a start) for each of these separate blocks (in this case actual items).
After defining the search_url (where I get the json from), my code for this currently looks like this:
result = requests.get(search_url)
data = result.json()
data.keys()
With this, I get told that they dict_keys are the following:
dict_keys(['numberOfResults', 'results', 'facets', 'entities', 'fulltexts', 'correctedQuery', 'highlightedTerms', 'randomSeed', 'nextCursorMark'])
Given the json from above, I know I want to look into "results" and then further into "docs". According to the documentation I found, I should be able to achieve this by addressing the results-part directly and then addressing the nested bit by separating the fields with ".". I have now tried the following the code:
fields = ["docs.id", "docs.label", "docs.title"]
df = pd.json_normalize(data["results"])
df[fields]
This works until df[field] - at this stage the programm tells me:
KeyError: "['docs.id'] not in index"
It does work for the level above though, so if I try the same with "name" and "docs" I get a lovely dataframe. What am I doing wrong? I am still a python and pandas beginner and would appreciate any help very much!
EDIT:
The desired dataframe output would look roughly like this:
id label title
0 RAKDI342342 exampellabel testtitle and titletest