Converting nested JSON to CSV

Question

This question has been probably asked several times here. I've been trying to make my nested JSON file flat and convert it to CSV, however the closest I could get was to list field names: MyCount, from, Mysize, Allhits, aggs, but with no values:

Output.csv:

""
Mycount
from
Mysize
Allhits
aggs

I've been trying this code to convert JSON to CSV:

import json
import csv

def get_leaves(item, key=None):
    if isinstance(item, dict):
        leaves = {}
        for i in item.keys():
            leaves.update(get_leaves(item[i], i))
        return leaves
    elif isinstance(item, list):
        leaves = {}
        for i in item:
            leaves.update(get_leaves(i, key))
        return leaves
    else:
        return {key : item}


with open('path/to/my/file.json') as f_input:
    json_data = json.load(f_input)

# Paresing all entries to get the complete fieldname list
fieldnames = set()

for entry in json_data:
    fieldnames.update(get_leaves(entry).keys())

with open('/path/to/myoutput.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
    csv_output.writeheader()
    csv_output.writerows(get_leaves(entry) for entry in json_data)

The JSON structure is like this:

{"Mycount":538,
"from":0,
"Mysize":1000,
"Allhits":[{
    "isVerified":true,
    "backgroundColor":"FF720B",
    "name":"yourShop",
    "Id":"12345678",
    "ActionItems":[{
        "subtitle":"Click here to start",
        "body":null,
        "language":"de",
        "title":"Start here",
        "isDefault":true}],
        "tintColor":"FFFFFF",
        "shoppingHours":[{"hours":[{"day":["SUNDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["MONDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["SATURDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["FRIDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["THURSDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["WEDNESDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["TUESDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]}]}],
        "LogoUrl":"https://url/to/my/logo.png",
        "coverage":[{
            "country":"*",
            "language":"*",
            "ratio":1}],
        "shoppingHours2":[{"hours":[{"day":["SUNDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["MONDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["SATURDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["FRIDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["THURSDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["WEDNESDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]},{"day":["TUESDAY"],"timeRange":[{"allDay":false,"from":25200,"to":68400}]}]}],
        "group":"shop_open",
        "timeZone":"CET",
        "phone":"+1234567890",
        "modTime":1234567890,
        "intId":"+123456789",
        "Logo2Url":"https://link/to/my/logo.png"}],
"aggs":{}}

Would this be easy to accomplish using pandas module? I'm still learning python so I would appreciate any guidance with this. The minimum what I need from this json file is to extract id, intId, name, ratio and values for this field names into CSV.

The desired output should be (alternatively it could have all field names and values and I would then be able to extract the fields I need from CSV directly):

id          intId         name    ratio
12345678    123456789   yourShop    1

This is just version for one record, but my output file has to include rows for all IDs present in JSON file.

Thanks for your help in advance!

EDIT I've tried also the following:

import json
import csv


x = '/path/to/myrecords.json'

x = json.loads(x)

f.writerow(["name", "id", "intId", "ratio"])

f = csv.writer(open("/path/to/my/output.csv", "w", newline=''))

for x in x:
    f.writerow([x["Allhits"]["name"],
                x["Allhits"]["id"],
                x["Allhits"]["ActionItems"]["intId"],
                x["Allhits"]["ActionItems"]["ratio"]])

but received this error for x = json.loads(x) step:

    Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/myusername/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/Users/myusername/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/myusername/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Why to not just retrieve all required values using keys into a list and write to csv file? — Olvin Roght, Feb 23 '20 at 13:01
Does this answer your question? [How can I convert JSON to CSV?](https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv) — F.NiX, Feb 23 '20 at 13:16
@F.NiX I've tried but still struggling (see EDIT paragraph at the bottom of my post). Would you be able to help with this? — Baobab1988, Feb 23 '20 at 13:47
@Olvin, would you be able to share some code? This would help a lot considering that I'm just a beginner with Python! Thanks! — Baobab1988, Feb 23 '20 at 13:49
@Baobab1988, you're doing `for x in x`. First of all, it won't work, but even if it will, your example contains single json object, not an array. What are you trying to iterate over? — Olvin Roght, Feb 23 '20 at 13:54

score 0 · Answer 1 · edited Feb 23 '20 at 13:59

If you need to flatten whole json, including arrays, you could do something like this with recurtion:

import json
import csv

def flatten(item, prefix=None):
    result = {}
    if isinstance(item, list):
        item = { i : item[i] for i in range(0, len(item) )}
    for key, val in item.items():
        prefixed_key = f"{prefix}{key}" if prefix else str(key)
        if isinstance(val, list) or isinstance(val, dict):
            result = {**result, **flatten(val, f"{prefixed_key}_")}
        else:
            result[prefixed_key] = val

    return result


with open('test.json') as f_input, open('result.csv', 'w', newline='') as f_output:
    writer = csv.writer(f_output)
    hits = json.load(f_input)["Allhits"]
    header_written = False
    for hit in hits:
        flat = flatten(hit)
        if not header_written:
            writer.writerow(flat.keys())
            header_written = True
        writer.writerow(flat.values())

with this you'd get this csv monstrocity:

isVerified,backgroundColor,name,Id,ActionItems_0_subtitle,ActionItems_0_body,ActionItems_0_language,ActionItems_0_title,ActionItems_0_isDefault,tintColor,shoppingHours_0_hours_0_day_0,shoppingHours_0_hours_0_timeRange_0_allDay,shoppingHours_0_hours_0_timeRange_0_from,shoppingHours_0_hours_0_timeRange_0_to,shoppingHours_0_hours_1_day_0,shoppingHours_0_hours_1_timeRange_0_allDay,shoppingHours_0_hours_1_timeRange_0_from,shoppingHours_0_hours_1_timeRange_0_to,shoppingHours_0_hours_2_day_0,shoppingHours_0_hours_2_timeRange_0_allDay,shoppingHours_0_hours_2_timeRange_0_from,shoppingHours_0_hours_2_timeRange_0_to,shoppingHours_0_hours_3_day_0,shoppingHours_0_hours_3_timeRange_0_allDay,shoppingHours_0_hours_3_timeRange_0_from,shoppingHours_0_hours_3_timeRange_0_to,shoppingHours_0_hours_4_day_0,shoppingHours_0_hours_4_timeRange_0_allDay,shoppingHours_0_hours_4_timeRange_0_from,shoppingHours_0_hours_4_timeRange_0_to,shoppingHours_0_hours_5_day_0,shoppingHours_0_hours_5_timeRange_0_allDay,shoppingHours_0_hours_5_timeRange_0_from,shoppingHours_0_hours_5_timeRange_0_to,shoppingHours_0_hours_6_day_0,shoppingHours_0_hours_6_timeRange_0_allDay,shoppingHours_0_hours_6_timeRange_0_from,shoppingHours_0_hours_6_timeRange_0_to,LogoUrl,coverage_0_country,coverage_0_language,coverage_0_ratio,shoppingHours2_0_hours_0_day_0,shoppingHours2_0_hours_0_timeRange_0_allDay,shoppingHours2_0_hours_0_timeRange_0_from,shoppingHours2_0_hours_0_timeRange_0_to,shoppingHours2_0_hours_1_day_0,shoppingHours2_0_hours_1_timeRange_0_allDay,shoppingHours2_0_hours_1_timeRange_0_from,shoppingHours2_0_hours_1_timeRange_0_to,shoppingHours2_0_hours_2_day_0,shoppingHours2_0_hours_2_timeRange_0_allDay,shoppingHours2_0_hours_2_timeRange_0_from,shoppingHours2_0_hours_2_timeRange_0_to,shoppingHours2_0_hours_3_day_0,shoppingHours2_0_hours_3_timeRange_0_allDay,shoppingHours2_0_hours_3_timeRange_0_from,shoppingHours2_0_hours_3_timeRange_0_to,shoppingHours2_0_hours_4_day_0,shoppingHours2_0_hours_4_timeRange_0_allDay,shoppingHours2_0_hours_4_timeRange_0_from,shoppingHours2_0_hours_4_timeRange_0_to,shoppingHours2_0_hours_5_day_0,shoppingHours2_0_hours_5_timeRange_0_allDay,shoppingHours2_0_hours_5_timeRange_0_from,shoppingHours2_0_hours_5_timeRange_0_to,shoppingHours2_0_hours_6_day_0,shoppingHours2_0_hours_6_timeRange_0_allDay,shoppingHours2_0_hours_6_timeRange_0_from,shoppingHours2_0_hours_6_timeRange_0_to,group,timeZone,phone,modTime,intId,Logo2Url
True,FF720B,yourShop,12345678,Click here to start,,de,Start here,True,FFFFFF,SUNDAY,False,25200,68400,MONDAY,False,25200,68400,SATURDAY,False,25200,68400,FRIDAY,False,25200,68400,THURSDAY,False,25200,68400,WEDNESDAY,False,25200,68400,TUESDAY,False,25200,68400,https://url/to/my/logo.png,*,*,1,SUNDAY,False,25200,68400,MONDAY,False,25200,68400,SATURDAY,False,25200,68400,FRIDAY,False,25200,68400,THURSDAY,False,25200,68400,WEDNESDAY,False,25200,68400,TUESDAY,False,25200,68400,shop_open,CET,+1234567890,1234567890,+123456789,https://link/to/my/logo.png

But if you only need specific keys, you can just iterate through your Allhits and retrieve whats needed like this:

with open('test.json') as f_input, open('result.csv', 'w', newline='') as f_output:
    writer = csv.writer(f_output)
    hits = json.load(f_input)["Allhits"]
    writer.writerow(["Id", "intId", "name", "ratio"])
    for hit in hits:
        writer.writerow([hit["Id"], hit["intId"], hit["name"], hit["coverage"][0]["ratio"]])

score 0 · Accepted Answer · answered Feb 23 '20 at 13:56

Try this. This will loop through the Allhits list and get the minimum set of data you need:

import json
import csv

with open('/path/to/myrecords.json') as f_input:
    json_data = json.load(f_input)

with open('/path/to/my/output.csv', 'w', newline='') as f_output:
    csv_output  = csv.writer(f_output)
    csv_output.writerow(["id", "intId", "name", "ratio"])
    for x in json_data['Allhits']:
        csv_output.writerow([x["Id"], x["intId"], x["name"], x["coverage"][0]["ratio"]])

@Baobab1988 Good to hear that. You were nearly there. – jignatius Feb 23 '20 at 14:22 — jignatius, Feb 23 '20 at 14:22

score 0 · Answer 3 · answered Feb 23 '20 at 14:08

x = json.loads(x)

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This is because json.loads expects a string containing the json data while the file name is passed as x.

As "Allhits":[{...}] is a single element list containing a dictionary replace x["Allhits"]["name"] with x["Allhits"][0]["name"]. Similarly when accessing the other elements like "Id".

Converting nested JSON to CSV

3 Answers3