1

I have a JSON file

response ={
  "classifier_id": "xxxxx-xx-1",
  "url": "/testers/xxxxx-xx-1",
  "collection": [
    {
      "text": "How hot will it be today?",
      "top_class": "temperature",
      "classes": [
        {
          "class_name": "temperature",
          "confidence": 0.993
        },
        {
          "class_name": "conditions",
          "confidence": 0.006
        }
      ]
    },
    {
      "text": "Is it hot outside?",
      "top_class": "temperature",
      "classes": [
        {
          "class_name": "temperature",
          "confidence": 1.0
        },
        {
          "class_name": "conditions",
          "confidence": 0.0
        }
      ]
    }
  ]
}

Current Output

enter image description here

Code and undesired output

enter image description here

I tried json_normalize, however, it's giving duplicates.

How can I convert this Jason file to Pandas DataFrame?

The records for each collection should be expanded wide, not long.

result: DataFrameImage

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158

2 Answers2

2
df = pd.DataFrame([flatten_json(x) for x in response['collection']])

# display(df)
                        text    top_class classes_0_class_name  classes_0_confidence classes_1_class_name  classes_1_confidence
0  How hot will it be today?  temperature          temperature                 0.993           conditions                 0.006
1         Is it hot outside?  temperature          temperature                 1.000           conditions                 0.000
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
1

If json_normalize() doesn't work for your json structure properly, you could parse it with your custom logic. Here is an example:

# define dictionary with desired structure
d = {
     'text': [],
     'top_class': [],
     'temperature': [],
     'confidence': [] 
}

# load json
data = json.loads(response)

# iterate over collection and extract elements needed
for el in data['collection']:
    d['text'].append(el['text'])
    d['top_class'].append(el['top_class'])
    d['temperature'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'temperature'][0])
    d['confidence'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'conditions'][0])
    
df = pd.DataFrame(d)

df.head()

Output:

Output

Alexandra Dudkina
  • 4,302
  • 3
  • 15
  • 27