Nested Json to csv with python 3

Question

I am trying to convert JSON data into a CSV in Python and found this code listed on Stack Exchange from a while back (link:How can I convert JSON to CSV?). It no longer works in Python 3, giving me different errors. Anyone know how to fix for Python 3? Thanks.

Below is my JSON data:

{ "fruit": [ 
    { "name": "Apple", 
      "binomial name": "Malus domestica", 
      "major_producers": [ "China", "United States", "Turkey" ], 
      "nutrition": 
          { "carbohydrates": "13.81g", 
            "fat": "0.17g", 
            "protein": "0.26g" 
          } 
     }, 
     { "name": "Orange", 
       "binomial name": "Citrus x sinensis", 
       "major_producers": [ "Brazil", "United States", "India" ],
       "nutrition": 
          { "carbohydrates": "11.75g", 
            "fat": "0.12g", 
            "protein": "0.94g" 
          } 
      }, 
      { "name": "Mango", 
        "binomial name": "Mangifera indica", 
        "major_producers": [ "India", "China", "Thailand" ],
        "nutrition": 
            { "carbohydrates": "15g", 
              "fat": "0.38g", 
              "protein": "0.82g" 
            } 
       } 
] }

The output CSV should look like

No code in your question and no error, what could we do with that? — Ofer Sadan, May 22 '18 at 07:56
These are the errors am getting: string indices must be integers, attributeError: 'str' object has no attribute 'keys', TypeError: string indices must be integers, list index out of range, 'dict_keys' object is not subscriptable — Ravi Kumar T C, May 22 '18 at 08:43

score 1 · Answer 1 · answered May 22 '18 at 08:14

the most easiest way to go would be throwing the desired dict into a pandas dataframe and use its .to_csv() method:

json_data = { "fruit": [ { "name": "Apple", "binomial name": "Malus domestica", "major_producers": [ "China", "United States", "Turkey" ], "nutrition": { "carbohydrates": "13.81g", "fat": "0.17g", "protein": "0.26g" } }, { "name": "Orange", "binomial name": "Citrus x sinensis", "major_producers": [ "Brazil", "United States", "India" ], "nutrition": { "carbohydrates": "11.75g", "fat": "0.12g", "protein": "0.94g" } }, { "name": "Mango", "binomial name": "Mangifera indica", "major_producers": [ "India", "China", "Thailand" ], "nutrition": { "carbohydrates": "15g", "fat": "0.38g", "protein": "0.82g" } } ] }
df = pd.DataFrame(json_data['fruit'])
df.to_csv('/wherever/file/shall/roam/test.csv')

which leads to a csv file like

But my CSV output should look like as mentioned above – Ravi Kumar T C May 22 '18 at 09:29 — Ravi Kumar T C, May 22 '18 at 09:29

score 0 · Answer 2 · answered May 22 '18 at 10:05

Still using pandas but slightly different approach by treating your JSON as a dictionary

import pandas as pd
import pprint as pprint
x = { "fruit": [ { "name": "Apple", "binomial name": "Malus domestica", "major_producers": [ "China", "United States", "Turkey" ], "nutrition": { "carbohydrates": "13.81g", "fat": "0.17g", "protein": "0.26g" } }, { "name": "Orange", "binomial name": "Citrus x sinensis", "major_producers": [ "Brazil", "United States", "India" ], "nutrition": { "carbohydrates": "11.75g", "fat": "0.12g", "protein": "0.94g" } }, { "name": "Mango", "binomial name": "Mangifera indica", "major_producers": [ "India", "China", "Thailand" ], "nutrition": { "carbohydrates": "15g", "fat": "0.38g", "protein": "0.82g" } } ] }

add some additional information to the dict that will give additional headers closer to the desired output.

for item in x['fruit']:
  for index, country in enumerate(item['major_producers']):
    new_key = 'major_producers'+str(index + 1)
    item[new_key] = country
  item['carbs'] = item['nutrition']['carbohydrates']
  item['fat'] = item['nutrition']['fat']
  item['protein']= item['nutrition']['protein']

pretty print of the updated dict

pprint(x['fruit'])

Create the pandas dataframe from the list of dicts as in:

xdf = pd.DataFrame.from_dict(x['fruit'])

Use only the headers you require

xdf = xdf[['name', 'binomial name', 'major_producers1','major_producers2','major_producers3','carbs','fat','protein']]

Then as @SpghttCd mentions you can use the pd.to_csv. No need for index in this case.

xdf.to_csv('filename.csv',index=False)

The csv file should look like this:

Exactly., Is it possible to make the script as the generic one, so that it will work for any nested json file? — Ravi Kumar T C, May 22 '18 at 10:18
I am working on a project. There is one particular job which run in my system daily and it will be generating the JSON file (same as the above nested JSON). I need to convert this data into CSV format for reporting. — Ravi Kumar T C, May 22 '18 at 10:19
@RaviKumarTC the trick, in this case, is to know your data source. (hopefully, it's somewhat consistent) If a key: value is a list, it needs to be handled differently than if the value is another dict. The bottom line to work with pandas is to get a list of simple key:value dicts for each row in your CSV. — Timothy Lombard, May 22 '18 at 10:28
Thank you @Timothy Lombard. Could you please help me in this case? My system will be generating a JSON file(nested json) on daily basis. I am trying to write a script in pyton 3 to covert it into a csv file. Note- the script should work for any nested JSON data. — Ravi Kumar T C, May 22 '18 at 10:44

Nested Json to csv with python 3

2 Answers2