2

I am trying to create a pandas dataframe out of a nested json. For some reason, I seem to be unable to address the third level.

My json looks something like this:

  "numberOfResults": 376,
  "results": [
    {
      "name": "single",
      "docs": [
        {
          "id": "RAKDI342342",
          "type": "Culture",
          "category": "Culture",
          "media": "unknown",
          "label": "exampellabel",
          "title": "testtitle and titletest",
          "subtitle": "Archive" 

            ]
        },
        {
          "id": "GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER",
          "type": "Culture",
          "category": "Culture",
          "media": "image",
          "label": "more label als example",
          "title": "test the second title",
          "subtitle": "picture"
          

and so on.

Within the "docs"-part are all the actual results, starting with "id". Once all the information is there, the next block starting with "id" simply follows.

Now I am trying to create a table with the keys id, label and title (for a start) for each of these separate blocks (in this case actual items).

After defining the search_url (where I get the json from), my code for this currently looks like this:

result = requests.get(search_url)
data = result.json()
data.keys() 

With this, I get told that they dict_keys are the following:

dict_keys(['numberOfResults', 'results', 'facets', 'entities', 'fulltexts', 'correctedQuery', 'highlightedTerms', 'randomSeed', 'nextCursorMark'])

Given the json from above, I know I want to look into "results" and then further into "docs". According to the documentation I found, I should be able to achieve this by addressing the results-part directly and then addressing the nested bit by separating the fields with ".". I have now tried the following the code:

fields = ["docs.id", "docs.label", "docs.title"]
df = pd.json_normalize(data["results"])
df[fields]

This works until df[field] - at this stage the programm tells me:

KeyError: "['docs.id'] not in index"

It does work for the level above though, so if I try the same with "name" and "docs" I get a lovely dataframe. What am I doing wrong? I am still a python and pandas beginner and would appreciate any help very much!

EDIT:

The desired dataframe output would look roughly like this:

    id              label            title  
0   RAKDI342342     exampellabel     testtitle and titletest    
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
ssp24
  • 121
  • 1
  • 8

1 Answers1

2
import pandas as pd

df = pd.json_normalize(data, record_path=['results', 'docs'], meta=[['results', 'name'], 'numberOfResults'])

display(df)
                                 id     type category    media                   label                    title subtitle results.name numberOfResults
0                       RAKDI342342  Culture  Culture  unknown            exampellabel  testtitle and titletest  Archive       single             376
1  GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER  Culture  Culture    image  more label als example    test the second title  picture       single             376

Data

  • The posted JSON / Dict is not correctly formed
  • Assuming the following corrected form
data = \
{'numberOfResults': 376,
 'results': [{'docs': [{'category': 'Culture',
                        'id': 'RAKDI342342',
                        'label': 'exampellabel',
                        'media': 'unknown',
                        'subtitle': 'Archive',
                        'title': 'testtitle and titletest',
                        'type': 'Culture'},
                       {'category': 'Culture',
                        'id': 'GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER',
                        'label': 'more label als example',
                        'media': 'image',
                        'subtitle': 'picture',
                        'title': 'test the second title',
                        'type': 'Culture'}],
              'name': 'single'}]}
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158