312

I have a JSON file I want to convert to a CSV file. How can I do this with Python?

I tried:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    csv_file.writerow(item)

f.close()

However, it did not work. I am using Django and the error I received is:

`file' object has no attribute 'writerow'`

I then tried the following:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    f.writerow(item)  # ← changed

f.close()

I then get the error:

`sequence expected`

Sample json file:

[{
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    }, {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    }, {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }, {
        "pk": 4,
        "model": "auth.permission",
        "fields": {
            "codename": "add_group",
            "name": "Can add group",
            "content_type": 2
        }
    }, {
        "pk": 10,
        "model": "auth.permission",
        "fields": {
            "codename": "add_message",
            "name": "Can add message",
            "content_type": 4
        }
    }
]
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
little_fish
  • 4,169
  • 5
  • 20
  • 8

27 Answers27

257

With the pandas library, this is as easy as using two commands!

df = pd.read_json()

read_json converts a JSON string to a pandas object (either a series or dataframe). Then:

df.to_csv()

Which can either return a string or write directly to a csv-file. See the docs for to_csv.

Based on the verbosity of previous answers, we should all thank pandas for the shortcut.

For unstructured JSON see this answer.

EDIT: Someone asked for a working minimal example:

import pandas as pd

with open('jsonfile.json', encoding='utf-8') as inputfile:
    df = pd.read_json(inputfile)

df.to_csv('csvfile.csv', encoding='utf-8', index=False)
vmg
  • 4,176
  • 2
  • 20
  • 33
  • 1
    This can be done as a minimal one-liner too: `curl url | python -c 'import sys,pandas as pd;pd.read_json(sys.stdin).to_csv(sys.stdout)'` – keithpjolley May 11 '23 at 15:12
  • Pandas CSV converter completely ignored a long text column I had when trying to convert. Apparently it has some sort of limit on length of fields? – Dr. Chocolate Jul 31 '23 at 14:05
150

First, your JSON has nested objects, so it normally cannot be directly converted to CSV. You need to change that to something like this:

{
    "pk": 22,
    "model": "auth.permission",
    "codename": "add_logentry",
    "content_type": 8,
    "name": "Can add log entry"
},
......]

Here is my code to generate CSV from that:

import csv
import json

x = """[
    {
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    },
    {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    },
    {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }
]"""

x = json.loads(x)

f = csv.writer(open("test.csv", "wb+"))

# Write CSV Header, If you dont need that, remove this line
f.writerow(["pk", "model", "codename", "name", "content_type"])

for x in x:
    f.writerow([x["pk"],
                x["model"],
                x["fields"]["codename"],
                x["fields"]["name"],
                x["fields"]["content_type"]])

You will get output as:

pk,model,codename,name,content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
YOU
  • 120,166
  • 34
  • 186
  • 219
  • 3
    this is work but sorry before can i get something that not hard code i thing it better id i can use f.writerow(a) and the a is some variabel that i declare before thanks before – little_fish Dec 09 '09 at 08:16
  • For me this works almost perfectly. In the exported CSV, some of the fields are surrounded by `[u'` and `']`. What's the (non-post-processing) workaround? if there is one... :) – Dror Jul 10 '14 at 12:20
  • 3
    Below I've shown a way to do it more generally, without having to hard-code it – Alec McGail Aug 26 '15 at 21:11
  • 9
    hey, i've tried this but I'm getting a `TypeError: a bytes-like object is required, not 'str'` at `f.writerow(['pk', 'model', 'codename', 'name', 'content_type'])` – Aditya Hariharan Mar 07 '17 at 09:31
  • @AdityaHariharan, is that python3? – YOU Mar 07 '17 at 09:33
  • May be take a look at - http://stackoverflow.com/questions/7200606/python3-writing-csv-files for python3 – YOU Mar 07 '17 at 09:39
  • Although it's been long time but if you are getting "TypeError: a bytes-like object is required, not 'str'" error then try f = csv.writer(open("test.csv", "wb+")) – Amit Gupta Mar 29 '19 at 06:53
  • 13
    for python3 change line with opening csv file to `f = csv.writer(open("test.csv", "w", newline=''))` – PiotrK Apr 14 '19 at 12:15
  • 1
    `TypeError: a bytes-like object is required, not 'str'` is about the encoding type in Python3. Try `f.writer(open('test.csv', 'w', encoding='utf8'))`. – Fred Lai Jul 29 '20 at 22:04
117

I am assuming that your JSON file will decode into a list of dictionaries. First we need a function which will flatten the JSON objects:

def flattenjson(b, delim):
    val = {}
    for i in b.keys():
        if isinstance(b[i], dict):
            get = flattenjson(b[i], delim)
            for j in get.keys():
                val[i + delim + j] = get[j]
        else:
            val[i] = b[i]
            
    return val

The result of running this snippet on your JSON object:

flattenjson({
    "pk": 22, 
    "model": "auth.permission", 
    "fields": {
      "codename": "add_message", 
      "name": "Can add message", 
      "content_type": 8
    }
  }, "__")

is

{
    "pk": 22, 
    "model": "auth.permission", 
    "fields__codename": "add_message", 
    "fields__name": "Can add message", 
    "fields__content_type": 8
}

After applying this function to each dict in the input array of JSON objects:

input = map(lambda x: flattenjson( x, "__" ), input)

and finding the relevant column names:

columns = [x for row in input for x in row.keys()]
columns = list(set(columns))

it's not hard to run this through the csv module:

with open(fname, 'wb') as out_file:
    csv_w = csv.writer(out_file)
    csv_w.writerow(columns)

    for i_r in input:
        csv_w.writerow(map(lambda x: i_r.get(x, ""), columns))
starball
  • 20,030
  • 7
  • 43
  • 238
Alec McGail
  • 1,301
  • 1
  • 8
  • 7
42

JSON can represent a wide variety of data structures -- a JS "object" is roughly like a Python dict (with string keys), a JS "array" roughly like a Python list, and you can nest them as long as the final "leaf" elements are numbers or strings.

CSV can essentially represent only a 2-D table -- optionally with a first row of "headers", i.e., "column names", which can make the table interpretable as a list of dicts, instead of the normal interpretation, a list of lists (again, "leaf" elements can be numbers or strings).

So, in the general case, you can't translate an arbitrary JSON structure to a CSV. In a few special cases you can (array of arrays with no further nesting; arrays of objects which all have exactly the same keys). Which special case, if any, applies to your problem? The details of the solution depend on which special case you do have. Given the astonishing fact that you don't even mention which one applies, I suspect you may not have considered the constraint, neither usable case in fact applies, and your problem is impossible to solve. But please do clarify!

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
40

Use json_normalize from pandas:

  • Using the sample data from the OP in a file named test.json.
  • encoding='utf-8' has been used here, but may not be necessary for other cases.
  • The following code takes advantage of the pathlib library.
    • .open is a method of pathlib.
    • Works with non-Windows paths too.
  • Use pandas.to_csv(...) to save the data to a csv file.
import pandas as pd
from pathlib import Path
import json

# set path to file
p = Path(r'c:\some_path_to_file\test.json')

# read json
with p.open('r', encoding='utf-8') as f:
    data = json.loads(f.read())

# create dataframe
df = pd.json_normalize(data)

# dataframe view
 pk            model  fields.codename           fields.name  fields.content_type
 22  auth.permission     add_logentry     Can add log entry                    8
 23  auth.permission  change_logentry  Can change log entry                    8
 24  auth.permission  delete_logentry  Can delete log entry                    8
  4  auth.permission        add_group         Can add group                    2
 10  auth.permission      add_message       Can add message                    4

# save to csv
df.to_csv('test.csv', index=False, encoding='utf-8')

CSV Output:

pk,model,fields.codename,fields.name,fields.content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
4,auth.permission,add_group,Can add group,2
10,auth.permission,add_message,Can add message,4

Resources for more heavily nested JSON objects:

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
37

A generic solution which translates any json list of flat objects to csv.

Pass the input.json file as first argument on command line.

import csv, json, sys

input = open(sys.argv[1])
data = json.load(input)
input.close()

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for row in data:
    output.writerow(row.values())
Mike Repass
  • 6,825
  • 5
  • 38
  • 35
  • 3
    An important comment - this code infers the columns/headers from the fields in the very first row. If your json data has 'jagged' columns, i.e. lets say row1 has 5 columns but row2 has 6 columns, then you need to do a first pass over the data to get the total set of all columns and use that as the headers. – Mike Repass Dec 07 '12 at 21:59
  • With the data I had this was a great part of the solution I needed, since my JSON was not jagged it worked wonderfully with some slight adjustments for the output since I was running this within an existing script. – MichaelF Apr 24 '14 at 15:00
  • 1
    This code also assumes that the values will be output in the same order as the keys in the header row. While that may have worked by luck, it is by no means guaranteed. – RyanHennig Jul 28 '15 at 19:09
  • Getting encoding error. Any idea how to add encoding to utf-8? – Elad Tabak Apr 07 '16 at 09:34
26

This code should work for you, assuming that your JSON data is in a file called data.json.

import json
import csv

with open("data.json") as file:
    data = json.load(file)

with open("data.csv", "w") as file:
    csv_file = csv.writer(file)
    for item in data:
        fields = list(item['fields'].values())
        csv_file.writerow([item['pk'], item['model']] + fields)
Dan Loewenherz
  • 10,879
  • 7
  • 50
  • 81
  • 1
    Hmmm, no -- `csv_file.writerow` (there is no `f.writerow` of course, I assume you made a typo there!) wants a sequence, not a dict -- and in your example, each item is a dict. This would work for the OTHER special case, as I identified in my answer - where the JSON file has an array of arrays; it doesn't work for an array of objects, which is the special case you appear to be trying to solve (that one requires a `csv.DictWriter` -- and of course you need to extract the field names and decide on an order in order to instantiate it!-). – Alex Martelli Dec 09 '09 at 04:54
  • 1
    @DanLoewenherz That doesn't work on recent Python versions. TypeError: can only concatenate list (not "dict_values") to list – Datacrawler Jul 24 '19 at 16:07
  • 1
    Perfect answer. – Akshat Gupta Oct 12 '20 at 14:47
22

It'll be easy to use csv.DictWriter(),the detailed implementation can be like this:

def read_json(filename):
    return json.loads(open(filename).read())
def write_csv(data,filename):
    with open(filename, 'w+') as outf:
        writer = csv.DictWriter(outf, data[0].keys())
        writer.writeheader()
        for row in data:
            writer.writerow(row)
# implement
write_csv(read_json('test.json'), 'output.csv')

Note that this assumes that all of your JSON objects have the same fields.

Here is the reference which may help you.

Steeve
  • 385
  • 2
  • 13
NullPointer
  • 412
  • 7
  • 17
6

I was having trouble with Dan's proposed solution, but this worked for me:

import json
import csv 

f = open('test.json')
data = json.load(f)
f.close()

f=csv.writer(open('test.csv','wb+'))

for item in data:
  f.writerow([item['pk'], item['model']] + item['fields'].values())

Where "test.json" contained the following:

[ 
{"pk": 22, "model": "auth.permission", "fields": 
  {"codename": "add_logentry", "name": "Can add log entry", "content_type": 8 } }, 
{"pk": 23, "model": "auth.permission", "fields": 
  {"codename": "change_logentry", "name": "Can change log entry", "content_type": 8 } }, {"pk": 24, "model": "auth.permission", "fields": 
  {"codename": "delete_logentry", "name": "Can delete log entry", "content_type": 8 } }
]
Community
  • 1
  • 1
Amanda
  • 12,099
  • 17
  • 63
  • 91
  • Got error on trying your program on your sample data C:\curl>python json2csv.py Traceback (most recent call last): File "json2csv.py", line 11, in f.writerow([item['pk'], item['model']] + item['fields'].values()) TypeError: can only concatenate list (not "dict_values") to list – Dr. Mian Dec 10 '15 at 11:54
  • Tried it again just now in Python 2.7.9 and it works fine for me. – Amanda Dec 10 '15 at 19:33
5

Alec's answer is great, but it doesn't work in the case where there are multiple levels of nesting. Here's a modified version that supports multiple levels of nesting. It also makes the header names a bit nicer if the nested object already specifies its own key (e.g. Firebase Analytics / BigTable / BigQuery data):

"""Converts JSON with nested fields into a flattened CSV file.
"""

import sys
import json
import csv
import os

import jsonlines

from orderedset import OrderedSet

# from https://stackoverflow.com/a/28246154/473201
def flattenjson( b, prefix='', delim='/', val=None ):
  if val is None:
    val = {}

  if isinstance( b, dict ):
    for j in b.keys():
      flattenjson(b[j], prefix + delim + j, delim, val)
  elif isinstance( b, list ):
    get = b
    for j in range(len(get)):
      key = str(j)

      # If the nested data contains its own key, use that as the header instead.
      if isinstance( get[j], dict ):
        if 'key' in get[j]:
          key = get[j]['key']

      flattenjson(get[j], prefix + delim + key, delim, val)
  else:
    val[prefix] = b

  return val

def main(argv):
  if len(argv) < 2:
    raise Error('Please specify a JSON file to parse')

  print "Loading and Flattening..."
  filename = argv[1]
  allRows = []
  fieldnames = OrderedSet()
  with jsonlines.open(filename) as reader:
    for obj in reader:
      # print 'orig:\n'
      # print obj
      flattened = flattenjson(obj)
      #print 'keys: %s' % flattened.keys()
      # print 'flattened:\n'
      # print flattened
      fieldnames.update(flattened.keys())
      allRows.append(flattened)

  print "Exporting to CSV..."
  outfilename = filename + '.csv'
  count = 0
  with open(outfilename, 'w') as file:
    csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
    csvwriter.writeheader()
    for obj in allRows:
      # print 'allRows:\n'
      # print obj
      csvwriter.writerow(obj)
      count += 1

  print "Wrote %d rows" % count



if __name__ == '__main__':
  main(sys.argv)
phreakhead
  • 14,721
  • 5
  • 39
  • 40
5

This is a modification of @MikeRepass's answer. This version writes the CSV to a file, and works for both Python 2 and Python 3.

import csv,json
input_file="data.json"
output_file="data.csv"
with open(input_file) as f:
    content=json.load(f)
try:
    context=open(output_file,'w',newline='') # Python 3
except TypeError:
    context=open(output_file,'wb') # Python 2
with context as file:
    writer=csv.writer(file)
    writer.writerow(content[0].keys()) # header row
    for row in content:
        writer.writerow(row.values())
cowlinator
  • 7,195
  • 6
  • 41
  • 61
4

As mentioned in the previous answers the difficulty in converting json to csv is because a json file can contain nested dictionaries and therefore be a multidimensional data structure verses a csv which is a 2D data structure. However, a good way to turn a multidimensional structure to a csv is to have multiple csvs that tie together with primary keys.

In your example, the first csv output has the columns "pk","model","fields" as your columns. Values for "pk", and "model" are easy to get but because the "fields" column contains a dictionary, it should be its own csv and because "codename" appears to the be the primary key, you can use as the input for "fields" to complete the first csv. The second csv contains the dictionary from the "fields" column with codename as the the primary key that can be used to tie the 2 csvs together.

Here is a solution for your json file which converts a nested dictionaries to 2 csvs.

import csv
import json

def readAndWrite(inputFileName, primaryKey=""):
    input = open(inputFileName+".json")
    data = json.load(input)
    input.close()

    header = set()

    if primaryKey != "":
        outputFileName = inputFileName+"-"+primaryKey
        if inputFileName == "data":
            for i in data:
                for j in i["fields"].keys():
                    if j not in header:
                        header.add(j)
    else:
        outputFileName = inputFileName
        for i in data:
            for j in i.keys():
                if j not in header:
                    header.add(j)

    with open(outputFileName+".csv", 'wb') as output_file:
        fieldnames = list(header)
        writer = csv.DictWriter(output_file, fieldnames, delimiter=',', quotechar='"')
        writer.writeheader()
        for x in data:
            row_value = {}
            if primaryKey == "":
                for y in x.keys():
                    yValue = x.get(y)
                    if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list:
                        row_value[y] = str(yValue).encode('utf8')
                    elif type(yValue) != dict:
                        row_value[y] = yValue.encode('utf8')
                    else:
                        if inputFileName == "data":
                            row_value[y] = yValue["codename"].encode('utf8')
                            readAndWrite(inputFileName, primaryKey="codename")
                writer.writerow(row_value)
            elif primaryKey == "codename":
                for y in x["fields"].keys():
                    yValue = x["fields"].get(y)
                    if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list:
                        row_value[y] = str(yValue).encode('utf8')
                    elif type(yValue) != dict:
                        row_value[y] = yValue.encode('utf8')
                writer.writerow(row_value)

readAndWrite("data")
dmathewwws
  • 271
  • 2
  • 12
3

I know it has been a long time since this question has been asked but I thought I might add to everyone else's answer and share a blog post that I think explain the solution in a very concise way.

Here is the link

Open a file for writing

employ_data = open('/tmp/EmployData.csv', 'w')

Create the csv writer object

csvwriter = csv.writer(employ_data)
count = 0
for emp in emp_data:
      if count == 0:
             header = emp.keys()
             csvwriter.writerow(header)
             count += 1
      csvwriter.writerow(emp.values())

Make sure to close the file in order to save the contents

employ_data.close()
whale_steward
  • 2,088
  • 2
  • 25
  • 37
user3768804
  • 139
  • 1
  • 1
  • 15
3

It is not a very smart way to do it, but I have had the same problem and this worked for me:

import csv

f = open('data.json')
data = json.load(f)
f.close()

new_data = []

for i in data:
   flat = {}
   names = i.keys()
   for n in names:
      try:
         if len(i[n].keys()) > 0:
            for ii in i[n].keys():
               flat[n+"_"+ii] = i[n][ii]
      except:
         flat[n] = i[n]
   new_data.append(flat)  

f = open(filename, "r")
writer = csv.DictWriter(f, new_data[0].keys())
writer.writeheader()
for row in new_data:
   writer.writerow(row)
f.close()
rmNyro
  • 351
  • 2
  • 14
Mulder1
  • 41
  • 5
3

Surprisingly, I found that none of the answers posted here so far correctly deal with all possible scenarios (e.g., nested dicts, nested lists, None values, etc).

This solution should work across all scenarios:

def flatten_json(json):
    def process_value(keys, value, flattened):
        if isinstance(value, dict):
            for key in value.keys():
                process_value(keys + [key], value[key], flattened)
        elif isinstance(value, list):
            for idx, v in enumerate(value):
                process_value(keys + [str(idx)], v, flattened)
        else:
            flattened['__'.join(keys)] = value

    flattened = {}
    for key in json.keys():
        process_value([key], json[key], flattened)
    return flattened
Max Berman
  • 31
  • 2
2

My simple way to solve this:

Create a new Python file like: json_to_csv.py

Add this code:

import csv, json, sys
#if you are not using utf-8 files, remove the next line
sys.setdefaultencoding("UTF-8")
#check if you pass the input file and output file
if sys.argv[1] is not None and sys.argv[2] is not None:

    fileInput = sys.argv[1]
    fileOutput = sys.argv[2]

    inputFile = open(fileInput)
    outputFile = open(fileOutput, 'w')
    data = json.load(inputFile)
    inputFile.close()

    output = csv.writer(outputFile)

    output.writerow(data[0].keys())  # header row

    for row in data:
        output.writerow(row.values())

After add this code, save the file and run at the terminal:

python json_to_csv.py input.txt output.csv

I hope this help you.

SEEYA!

2

This code works for any given json file

# -*- coding: utf-8 -*-
"""
Created on Mon Jun 17 20:35:35 2019
author: Ram
"""

import json
import csv

with open("file1.json") as file:
    data = json.load(file)



# create the csv writer object
pt_data1 = open('pt_data1.csv', 'w')
csvwriter = csv.writer(pt_data1)

count = 0

for pt in data:

      if count == 0:

             header = pt.keys()

             csvwriter.writerow(header)

             count += 1

      csvwriter.writerow(pt.values())

pt_data1.close()
Ram Prajapati
  • 1,901
  • 1
  • 10
  • 8
2

If we consider the below example for converting the json format file to csv formatted file.

{
 "item_data" : [
      {
        "item": "10023456",
        "class": "100",
        "subclass": "123"
      }
      ]
}

The below code will convert the json file ( data3.json ) to csv file ( data3.csv ).

import json
import csv
with open("/Users/Desktop/json/data3.json") as file:
    data = json.load(file)
    file.close()
    print(data)

fname = "/Users/Desktop/json/data3.csv"

with open(fname, "w", newline='') as file:
    csv_file = csv.writer(file)
    csv_file.writerow(['dept',
                       'class',
                       'subclass'])
    for item in data["item_data"]:
         csv_file.writerow([item.get('item_data').get('dept'),
                            item.get('item_data').get('class'),
                            item.get('item_data').get('subclass')])

The above mentioned code has been executed in the locally installed pycharm and it has successfully converted the json file to the csv file. Hope this help to convert the files.

1

This works relatively well. It flattens the json to write it to a csv file. Nested elements are managed :)

That's for python 3

import json

o = json.loads('your json string') # Be careful, o must be a list, each of its objects will make a line of the csv.

def flatten(o, k='/'):
    global l, c_line
    if isinstance(o, dict):
        for key, value in o.items():
            flatten(value, k + '/' + key)
    elif isinstance(o, list):
        for ov in o:
            flatten(ov, '')
    elif isinstance(o, str):
        o = o.replace('\r',' ').replace('\n',' ').replace(';', ',')
        if not k in l:
            l[k]={}
        l[k][c_line]=o

def render_csv(l):
    ftime = True

    for i in range(100): #len(l[list(l.keys())[0]])
        for k in l:
            if ftime :
                print('%s;' % k, end='')
                continue
            v = l[k]
            try:
                print('%s;' % v[i], end='')
            except:
                print(';', end='')
        print()
        ftime = False
        i = 0

def json_to_csv(object_list):
    global l, c_line
    l = {}
    c_line = 0
    for ov in object_list : # Assumes json is a list of objects
        flatten(ov)
        c_line += 1
    render_csv(l)

json_to_csv(o)

enjoy.

Loïc
  • 11,804
  • 1
  • 31
  • 49
  • 1
    .csv file was not generated, instead, csv text was output to console. Also, `json.loads` was not working, I made it work with `json.load`, which nicely yields a list object. Third, nested elements were lost. – ZygD Apr 03 '19 at 15:08
1

Modified Alec McGail's answer to support JSON with lists inside

    def flattenjson(self, mp, delim="|"):
            ret = []
            if isinstance(mp, dict):
                    for k in mp.keys():
                            csvs = self.flattenjson(mp[k], delim)
                            for csv in csvs:
                                    ret.append(k + delim + csv)
            elif isinstance(mp, list):
                    for k in mp:
                            csvs = self.flattenjson(k, delim)
                            for csv in csvs:
                                    ret.append(csv)
            else:
                    ret.append(mp)

            return ret

Thanks!

taari
  • 954
  • 9
  • 8
1
import json,csv
t=''
t=(type('a'))
json_data = []
data = None
write_header = True
item_keys = []
try:
with open('kk.json') as json_file:
    json_data = json_file.read()

    data = json.loads(json_data)
except Exception as e:
    print( e)

with open('bar.csv', 'at') as csv_file:
    writer = csv.writer(csv_file)#, quoting=csv.QUOTE_MINIMAL)
    for item in data:
        item_values = []
        for key in item:
            if write_header:
                item_keys.append(key)
            value = item.get(key, '')
            if (type(value)==t):
                item_values.append(value.encode('utf-8'))
            else:
                item_values.append(value)
        if write_header:
            writer.writerow(item_keys)
            write_header = False
        writer.writerow(item_values)
0

Since the data appears to be in a dictionary format, it would appear that you should actually use csv.DictWriter() to actually output the lines with the appropriate header information. This should allow the conversion to be handled somewhat easier. The fieldnames parameter would then set up the order properly while the output of the first line as the headers would allow it to be read and processed later by csv.DictReader().

For example, Mike Repass used

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for row in data:
  output.writerow(row.values())

However just change the initial setup to output = csv.DictWriter(filesetting, fieldnames=data[0].keys())

Note that since the order of elements in a dictionary is not defined, you might have to create fieldnames entries explicitly. Once you do that, the writerow will work. The writes then work as originally shown.

sabbahillel
  • 4,357
  • 1
  • 19
  • 36
0

Unfortunately I have not enouthg reputation to make a small contribution to the amazing @Alec McGail answer. I was using Python3 and I have needed to convert the map to a list following the @Alexis R comment.

Additionaly I have found the csv writer was adding a extra CR to the file (I have a empty line for each line with data inside the csv file). The solution was very easy following the @Jason R. Coombs answer to this thread: CSV in Python adding an extra carriage return

You need to simply add the lineterminator='\n' parameter to the csv.writer. It will be: csv_w = csv.writer( out_file, lineterminator='\n' )

derwyddon
  • 69
  • 1
  • 3
-1

You can use this code to convert a json file to csv file After reading the file, I am converting the object to pandas dataframe and then saving this to a CSV file

import os
import pandas as pd
import json
import numpy as np

data = []
os.chdir('D:\\Your_directory\\folder')
with open('file_name.json', encoding="utf8") as data_file:    
     for line in data_file:
        data.append(json.loads(line))

dataframe = pd.DataFrame(data)        
## Saving the dataframe to a csv file
dataframe.to_csv("filename.csv", encoding='utf-8',index= False)
Terminator17
  • 782
  • 1
  • 6
  • 13
  • 1
    this does not take subfields (such as "fields" in the example) into account - the sub-object is in one column instead of its contents separated into individual columns as well. – Cribber Feb 21 '19 at 10:50
-1

I have tried a lot of the suggested solution (also Panda was not correctly normalizing my JSON) but the real good one which is parsing correctly the JSON data is from Max Berman.

I wrote an improvement to avoid new columns for each row and puts it to the existing column during parsing. It has also the effect to store a value as a string if only one data exists, and make a list if there are more values for that columns.

It takes an input.json file for input and spits out an output.csv.

    import json
    import pandas as pd
    
    
    def same_length(flattened: dict):
        max = 0
        for key in flattened.keys():
            if isinstance(flattened[key], list):
                if len(flattened[key]) > max:
                    max = len(flattened[key])
        for key in flattened.keys():
            if isinstance(flattened[key], list):
                if len(flattened[key]) < max:
                    for i in range(max - len(flattened[key])):
                        flattened[key].append(None)
        return flattened
    
    
    def process_value(keys, value, flattened):
        if isinstance(value, dict):
            for key in value.keys():
                process_value(keys + [key], value[key], flattened)
        elif isinstance(value, list):
            for idx, v in enumerate(value):
                process_value(keys, v, flattened)
        else:
            jkey = '__'.join(keys)
            if not flattened.get(jkey) is None:
                if isinstance(flattened[jkey], list):
                    flattened[jkey] = flattened[jkey] + [value]
                else:
                    flattened[jkey] = [flattened[jkey]] + [value]
            else:
                flattened[jkey] = value
    
    
    def flatten_json(json):
        flattened_result = {}
        json_list = []
        if isinstance(json, dict):
            json_list.append(json)
        elif isinstance(json, list):
            json_list = json
        else:
            print("JSON object must be a dict or list instance, but is type " + str(type(json)))
            return {}
        for j in json_list:
            for key in j.keys():
                process_value([key], j[key], flattened_result)
        return flattened_result
    
    
    try:
        f = open("input.json", "r")
    except (FileNotFoundError, PermissionError, OSError):
        print("Error opening file")
        exit(1)
    y = json.loads(f.read())
    flat = flatten_json(y)
    df = pd.DataFrame.from_dict(same_length(flat), orient='columns')
    df.to_csv('output.csv', index=False, encoding='utf-8')
    
    
Janumar
  • 202
  • 2
  • 7
  • 1
    ValueError: arrays must all be the same length – Elvin Aghammadzada Jul 25 '22 at 18:50
  • I decided to make a repository for the solution: https://github.com/networkinss/pysolution Feel free to make a PR for improvements or other Python solutions. It can cope with json array, null values. You need to use recursion to copy with nested elements. – Janumar Feb 28 '23 at 08:05
-1

As I mentioned in the How to convert a json response into yaml in bash the simplest way is by using yq

The source file

cat sample.json
[
  {"name": "Abc", "id": "10"},
  {"name": "Def", "id": "11"},
  {"name": "Xyz", "id": "12"}
]

As yaml

cat sample.json | yq -P
- name: Abc
  id: "10"
- name: Def
  id: "11"
- name: Xyz
  id: "12"

As csv

cat sample.json | yq e '.[0, 1] | [key+1, .id, .name] | @csv'
1,10,Abc
2,11,Def

Note: you can extract a range of records with [0, 1] and prepend a key in the output.

Andrei Sura
  • 2,465
  • 1
  • 20
  • 15
-2

I might be late to the party, but I think, I have dealt with the similar problem. I had a json file which looked like this

JSON File Structure

I only wanted to extract few keys/values from these json file. So, I wrote the following code to extract the same.

    """json_to_csv.py
    This script reads n numbers of json files present in a folder and then extract certain data from each file and write in a csv file.
    The folder contains the python script i.e. json_to_csv.py, output.csv and another folder descriptions containing all the json files.
"""

import os
import json
import csv


def get_list_of_json_files():
    """Returns the list of filenames of all the Json files present in the folder
    Parameter
    ---------
    directory : str
        'descriptions' in this case
    Returns
    -------
    list_of_files: list
        List of the filenames of all the json files
    """

    list_of_files = os.listdir('descriptions')  # creates list of all the files in the folder

    return list_of_files


def create_list_from_json(jsonfile):
    """Returns a list of the extracted items from json file in the same order we need it.
    Parameter
    _________
    jsonfile : json
        The json file containing the data
    Returns
    -------
    one_sample_list : list
        The list of the extracted items needed for the final csv
    """

    with open(jsonfile) as f:
        data = json.load(f)

    data_list = []  # create an empty list

    # append the items to the list in the same order.
    data_list.append(data['_id'])
    data_list.append(data['_modelType'])
    data_list.append(data['creator']['_id'])
    data_list.append(data['creator']['name'])
    data_list.append(data['dataset']['_accessLevel'])
    data_list.append(data['dataset']['_id'])
    data_list.append(data['dataset']['description'])
    data_list.append(data['dataset']['name'])
    data_list.append(data['meta']['acquisition']['image_type'])
    data_list.append(data['meta']['acquisition']['pixelsX'])
    data_list.append(data['meta']['acquisition']['pixelsY'])
    data_list.append(data['meta']['clinical']['age_approx'])
    data_list.append(data['meta']['clinical']['benign_malignant'])
    data_list.append(data['meta']['clinical']['diagnosis'])
    data_list.append(data['meta']['clinical']['diagnosis_confirm_type'])
    data_list.append(data['meta']['clinical']['melanocytic'])
    data_list.append(data['meta']['clinical']['sex'])
    data_list.append(data['meta']['unstructured']['diagnosis'])
    # In few json files, the race was not there so using KeyError exception to add '' at the place
    try:
        data_list.append(data['meta']['unstructured']['race'])
    except KeyError:
        data_list.append("")  # will add an empty string in case race is not there.
    data_list.append(data['name'])

    return data_list


def write_csv():
    """Creates the desired csv file
    Parameters
    __________
    list_of_files : file
        The list created by get_list_of_json_files() method
    result.csv : csv
        The csv file containing the header only
    Returns
    _______
    result.csv : csv
        The desired csv file
    """

    list_of_files = get_list_of_json_files()
    for file in list_of_files:
        row = create_list_from_json(f'descriptions/{file}')  # create the row to be added to csv for each file (json-file)
        with open('output.csv', 'a') as c:
            writer = csv.writer(c)
            writer.writerow(row)
        c.close()


if __name__ == '__main__':
    write_csv()

I hope this will help. For details on how this code work you can check here

  • 1
    Please [don’t post images of code, error messages, or other textual data.](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors) – tripleee Jun 30 '21 at 09:43
  • 1
    This "specialized" code helps no one. When u post, post a working solution for everyone. – Elvin Aghammadzada Jul 25 '22 at 18:51