-2

I am trying to convert JSON data into a CSV in Python and found this code listed on Stack Exchange from a while back (link:How can I convert JSON to CSV?). It no longer works in Python 3, giving me different errors. Anyone know how to fix for Python 3? Thanks.

Below is my JSON data:

{ "fruit": [ 
    { "name": "Apple", 
      "binomial name": "Malus domestica", 
      "major_producers": [ "China", "United States", "Turkey" ], 
      "nutrition": 
          { "carbohydrates": "13.81g", 
            "fat": "0.17g", 
            "protein": "0.26g" 
          } 
     }, 
     { "name": "Orange", 
       "binomial name": "Citrus x sinensis", 
       "major_producers": [ "Brazil", "United States", "India" ],
       "nutrition": 
          { "carbohydrates": "11.75g", 
            "fat": "0.12g", 
            "protein": "0.94g" 
          } 
      }, 
      { "name": "Mango", 
        "binomial name": "Mangifera indica", 
        "major_producers": [ "India", "China", "Thailand" ],
        "nutrition": 
            { "carbohydrates": "15g", 
              "fat": "0.38g", 
              "protein": "0.82g" 
            } 
       } 
] }

The output CSV should look like

enter image description here

Nayantara Jeyaraj
  • 2,624
  • 7
  • 34
  • 63

2 Answers2

1

the most easiest way to go would be throwing the desired dict into a pandas dataframe and use its .to_csv() method:

json_data = { "fruit": [ { "name": "Apple", "binomial name": "Malus domestica", "major_producers": [ "China", "United States", "Turkey" ], "nutrition": { "carbohydrates": "13.81g", "fat": "0.17g", "protein": "0.26g" } }, { "name": "Orange", "binomial name": "Citrus x sinensis", "major_producers": [ "Brazil", "United States", "India" ], "nutrition": { "carbohydrates": "11.75g", "fat": "0.12g", "protein": "0.94g" } }, { "name": "Mango", "binomial name": "Mangifera indica", "major_producers": [ "India", "China", "Thailand" ], "nutrition": { "carbohydrates": "15g", "fat": "0.38g", "protein": "0.82g" } } ] }
df = pd.DataFrame(json_data['fruit'])
df.to_csv('/wherever/file/shall/roam/test.csv')

which leads to a csv file like

enter image description here

SpghttCd
  • 10,510
  • 2
  • 20
  • 25
0

Still using pandas but slightly different approach by treating your JSON as a dictionary

import pandas as pd
import pprint as pprint
x = { "fruit": [ { "name": "Apple", "binomial name": "Malus domestica", "major_producers": [ "China", "United States", "Turkey" ], "nutrition": { "carbohydrates": "13.81g", "fat": "0.17g", "protein": "0.26g" } }, { "name": "Orange", "binomial name": "Citrus x sinensis", "major_producers": [ "Brazil", "United States", "India" ], "nutrition": { "carbohydrates": "11.75g", "fat": "0.12g", "protein": "0.94g" } }, { "name": "Mango", "binomial name": "Mangifera indica", "major_producers": [ "India", "China", "Thailand" ], "nutrition": { "carbohydrates": "15g", "fat": "0.38g", "protein": "0.82g" } } ] }

add some additional information to the dict that will give additional headers closer to the desired output.

for item in x['fruit']:
  for index, country in enumerate(item['major_producers']):
    new_key = 'major_producers'+str(index + 1)
    item[new_key] = country
  item['carbs'] = item['nutrition']['carbohydrates']
  item['fat'] = item['nutrition']['fat']
  item['protein']= item['nutrition']['protein']

pretty print of the updated dict

pprint(x['fruit'])

enter image description here

Create the pandas dataframe from the list of dicts as in:

xdf = pd.DataFrame.from_dict(x['fruit'])

Use only the headers you require

xdf = xdf[['name', 'binomial name', 'major_producers1','major_producers2','major_producers3','carbs','fat','protein']]

Then as @SpghttCd mentions you can use the pd.to_csv. No need for index in this case.

xdf.to_csv('filename.csv',index=False)

The csv file should look like this:

enter image description here

Timothy Lombard
  • 917
  • 1
  • 11
  • 31
  • Exactly., Is it possible to make the script as the generic one, so that it will work for any nested json file? – Ravi Kumar T C May 22 '18 at 10:18
  • I am working on a project. There is one particular job which run in my system daily and it will be generating the JSON file (same as the above nested JSON). I need to convert this data into CSV format for reporting. – Ravi Kumar T C May 22 '18 at 10:19
  • @RaviKumarTC the trick, in this case, is to know your data source. (hopefully, it's somewhat consistent) If a key: value is a list, it needs to be handled differently than if the value is another dict. The bottom line to work with pandas is to get a list of simple key:value dicts for each row in your CSV. – Timothy Lombard May 22 '18 at 10:28
  • Thank you @Timothy Lombard. Could you please help me in this case? My system will be generating a JSON file(nested json) on daily basis. I am trying to write a script in pyton 3 to covert it into a csv file. Note- the script should work for any nested JSON data. – Ravi Kumar T C May 22 '18 at 10:44