0

I would like to ask for some help with the conversion of a nested json into pandas df. I have read the quite brilliant input from a couple of year past, but that is outdated now. :( Flatten double nested JSON

So here is a sample of my input data (mind that classes might contain up to 10 class name and confidence pairs):

[
  {
    "classifier_id": "my_classifier_id",
    "url": "https://api.eu-de.natural-language-classifier...",
    "text": "for sales? aligning obsolete incentive\u00a0 system to what is the standard today: 100% reference salary, 100% of ref sal if you hit 100% of your quota",
    "top_class": "conditions",
    "classes": [
      {
        "class_name": "conditions",
        "confidence": 0.9074866214536228
      },
      {
        "class_name": "temperature",
        "confidence": 0.09251337854637723
      }
    ]
  },
  {
    "classifier_id": "my_classifier_id",
    "url": "https://api.eu-de.natural-language-classifier...",
    "text": "Complete integration of incentives.\u00a0 People act inline with how they are compensated as the general rule. \u00a0 If we get that right then this model can genuinely change the face of IBM to the client.",
    "top_class": "conditions",
    "classes": [
      {
        "class_name": "conditions",
        "confidence": 0.9683663322166756
      },
      {
        "class_name": "temperature",
        "confidence": 0.0316336677833244
      }
    ]
  },
  {
    "classifier_id": "my_classifier_id",
    "url": "https://api.eu-de.natural-language-classifier.watson...",
    "text": "Enablement, operational support on the most basic things",
    "top_class": "temperature",
    "classes": [
      {
        "class_name": "temperature",
        "confidence": 0.8174158442711534
      },
      {
        "class_name": "conditions",
        "confidence": 0.1825841557288465
      }
    ]
  }
]

What I have tried thus far in python:

data_df = pd.read_json(r'C:\Users\...\Documents\Python NLP\WATSON NLC\OUTPUT JSON\nlc_data_full.json')

When using this the classes still remain in a json like form:

[{'class_name': 'conditions', 'confidence': 0.907486621453622}, {'class_name': 'temperature', 'confidence': 0.092513378546377}]
[{'class_name': 'conditions', 'confidence': 0.9683663322166751}, {'class_name': 'temperature', 'confidence': 0.031633667783324}]
[{'class_name': 'temperature', 'confidence': 0.8174158442711531}, {'class_name': 'conditions', 'confidence': 0.182584155728846}]

I would love to get a format that can be worked on in excel. Thank you for looking into this.

1 Answers1

0

Well I think I managed to figure out what everyone already knew anyways. LOL So the magic is in the pd.json_normalize function. With the parameters it takes it basically is able to open multinested json files with relative ease.

Also the pandas site has been a good friend as always: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

I am calling my dataset: nlc_data = [ .......

Here is a super lightweight solution for cases that do not have such intricate nesting: normie_2 = pd.json_normalize(nlc_data, max_level=0)

This one works for multi nested json files:

result = pd.json_normalize(nlc_data, 'classes', ['text', 'top_class'])

Well I guess I got a lot smarter today. Bare with me ... I just might have another awesome questions tomorrow. Bye, Levi