How to flatten a JSON to a wide format, in pandas

Question

I have a JSON file

response ={
  "classifier_id": "xxxxx-xx-1",
  "url": "/testers/xxxxx-xx-1",
  "collection": [
    {
      "text": "How hot will it be today?",
      "top_class": "temperature",
      "classes": [
        {
          "class_name": "temperature",
          "confidence": 0.993
        },
        {
          "class_name": "conditions",
          "confidence": 0.006
        }
      ]
    },
    {
      "text": "Is it hot outside?",
      "top_class": "temperature",
      "classes": [
        {
          "class_name": "temperature",
          "confidence": 1.0
        },
        {
          "class_name": "conditions",
          "confidence": 0.0
        }
      ]
    }
  ]
}

Current Output

Code and undesired output

I tried json_normalize, however, it's giving duplicates.

How can I convert this Jason file to Pandas DataFrame?

The records for each collection should be expanded wide, not long.

result: DataFrameImage

score 2 · Accepted Answer · answered Oct 01 '20 at 23:44

Using flatten_json from SO: How to flatten a nested JSON recursively, with flatten_json?, the 'collection' records can be expanded into a wide form dataframe.
The column headers can be renamed, as needed, with pandas.DataFrame.rename.

df = pd.DataFrame([flatten_json(x) for x in response['collection']])

# display(df)
                        text    top_class classes_0_class_name  classes_0_confidence classes_1_class_name  classes_1_confidence
0  How hot will it be today?  temperature          temperature                 0.993           conditions                 0.006
1         Is it hot outside?  temperature          temperature                 1.000           conditions                 0.000

score 1 · Answer 2 · answered Sep 23 '20 at 08:54

If json_normalize() doesn't work for your json structure properly, you could parse it with your custom logic. Here is an example:

# define dictionary with desired structure
d = {
     'text': [],
     'top_class': [],
     'temperature': [],
     'confidence': [] 
}

# load json
data = json.loads(response)

# iterate over collection and extract elements needed
for el in data['collection']:
    d['text'].append(el['text'])
    d['top_class'].append(el['top_class'])
    d['temperature'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'temperature'][0])
    d['confidence'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'conditions'][0])
    
df = pd.DataFrame(d)

df.head()

Output:

How to flatten a JSON to a wide format, in pandas

Current Output

Code and undesired output

2 Answers2