-1

I have a list of json objects that looks like this:

{
  "a": 1,
  "b": 2,
  "c": 3
},
{
  "a": 5,
  "b": 6,
  "c": 7,
  "d": 8
},
{
  "a": 9,
  "b": 10 
}

I want to go through it and convert it to a csv. The problem is that the headers are inconsistent for a lot of the objects. I want the headers of the csv file to be the keys of the the object with the most keys. If the one of the objects is missing a key then it can be a null in the csv file.

Output:

a,b,c,d
1,2,3,null
5,6,7,8
9,10,null,null
Dominus
  • 29
  • 1
  • 7

1 Answers1

0

Given the variable dicts stores your input data as:

dicts = [{
  "a": 1,
  "b": 2,
  "c": 3
},
{
  "a": 5,
  "b": 6,
  "c": 7,
  "d": 8
},
{
  "a": 9,
  "b": 10 
}]
# First we accumulate the values for all so-called headers in the dicts that have them
acc = {} # keys are the headers, values are integer-keyed dicts that act as sparse arrays representing the index of a dict and the value it held for that header
for i, d in enumerate(dicts):
    for k, v in d.items():
        if k not in acc:
            acc[k] = {}

        acc[k][i] = v

# Now we gather the accumulated values for all the headers into a linear format that can be dumped to a .csv
headers = list(acc.keys()) # in an arbitrary order

dump = [headers]
for i in range(len(dicts)):
    row = []
    for h in headers:
        if i in acc[h]: # if the ith dict had a value for this header
            row.append(acc[h][i])
        else:
            row.append("null")

    dump.append(row)

Usage:

for line in dump:
    print(line)

Output:

['a', 'b', 'c', 'd']
[1, 2, 3, 'null']
[5, 6, 7, 8]
[9, 10, 'null', 'null']

To write it to the .csv file, you could do:

with open('myfile.csv','w') as f:
    for line in dump:
        for item in line:
            f.write(item + ',')
        f.write('\n')

myfile.csv

a,b,c,d
1,2,3,null
5,6,7,8
9,10,null,null

(.csv dumping code taken from this SO answer: https://stackoverflow.com/a/28863461/12109043)

zr0gravity7
  • 2,917
  • 1
  • 12
  • 33