How to normalize a nested json with json_normalize

Question

I am trying to create a pandas dataframe out of a nested json. For some reason, I seem to be unable to address the third level.

My json looks something like this:

  "numberOfResults": 376,
  "results": [
    {
      "name": "single",
      "docs": [
        {
          "id": "RAKDI342342",
          "type": "Culture",
          "category": "Culture",
          "media": "unknown",
          "label": "exampellabel",
          "title": "testtitle and titletest",
          "subtitle": "Archive" 

            ]
        },
        {
          "id": "GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER",
          "type": "Culture",
          "category": "Culture",
          "media": "image",
          "label": "more label als example",
          "title": "test the second title",
          "subtitle": "picture"

and so on.

Within the "docs"-part are all the actual results, starting with "id". Once all the information is there, the next block starting with "id" simply follows.

Now I am trying to create a table with the keys id, label and title (for a start) for each of these separate blocks (in this case actual items).

After defining the search_url (where I get the json from), my code for this currently looks like this:

result = requests.get(search_url)
data = result.json()
data.keys()

With this, I get told that they dict_keys are the following:

dict_keys(['numberOfResults', 'results', 'facets', 'entities', 'fulltexts', 'correctedQuery', 'highlightedTerms', 'randomSeed', 'nextCursorMark'])

Given the json from above, I know I want to look into "results" and then further into "docs". According to the documentation I found, I should be able to achieve this by addressing the results-part directly and then addressing the nested bit by separating the fields with ".". I have now tried the following the code:

fields = ["docs.id", "docs.label", "docs.title"]
df = pd.json_normalize(data["results"])
df[fields]

This works until df[field] - at this stage the programm tells me:

KeyError: "['docs.id'] not in index"

It does work for the level above though, so if I try the same with "name" and "docs" I get a lovely dataframe. What am I doing wrong? I am still a python and pandas beginner and would appreciate any help very much!

EDIT:

The desired dataframe output would look roughly like this:

    id              label            title  
0   RAKDI342342     exampellabel     testtitle and titletest

Trenton McKinney · Accepted Answer · 2021-04-29T19:26:40.713

Use pandas.json_normalize()
The following code uses pandas v.1.2.4
If you don't want the other columns, remove the list of keys assigned to meta
Use pandas.DataFrame.drop to remove any other unwanted columns from df.

import pandas as pd

df = pd.json_normalize(data, record_path=['results', 'docs'], meta=[['results', 'name'], 'numberOfResults'])

display(df)
                                 id     type category    media                   label                    title subtitle results.name numberOfResults
0                       RAKDI342342  Culture  Culture  unknown            exampellabel  testtitle and titletest  Archive       single             376
1  GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER  Culture  Culture    image  more label als example    test the second title  picture       single             376

Data

The posted JSON / Dict is not correctly formed
Assuming the following corrected form

data = \
{'numberOfResults': 376,
 'results': [{'docs': [{'category': 'Culture',
                        'id': 'RAKDI342342',
                        'label': 'exampellabel',
                        'media': 'unknown',
                        'subtitle': 'Archive',
                        'title': 'testtitle and titletest',
                        'type': 'Culture'},
                       {'category': 'Culture',
                        'id': 'GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER',
                        'label': 'more label als example',
                        'media': 'image',
                        'subtitle': 'picture',
                        'title': 'test the second title',
                        'type': 'Culture'}],
              'name': 'single'}]}

How to normalize a nested json with json_normalize

1 Answers1

Data

Linked

Related