6

Context

I am looking to export a dictionary which contains a list of dictionaries as the value of each key:value pair:

dict = {'key_1':
        [{'key_a': foo_1, 'key_b': bar_1}, 
         {'key_a': foo_2, 'key_b': bar_2}], 
        'key_2':
        [{'key_c': foo_1, 'key_d': bar_1}], 
         {'key_c': foo_2, 'key_d': bar_2}] 
        ...}

The desired output will be a .csv file which has the keys of the first dictionary (key_1, key_2, etc.), as the first header row, then the keys of the nested dictionaries (key_a, key_b, etc.), as a second header row which corresponds to its respective key.

An example of the desired output looks like this, where the list index column refers to data stored within the dictionary at each respective index within the list of dictionaries:

╔════════════╦═══════════════╤═══════════════╗
║            ║     key_1     │     key_2     ║
║ List Index ╠═══════╤═══════╪═══════╤═══════╣
║            ║ key_a │ key_b │ key_c │ key_d ║
╠════════════╬═══════╪═══════╪═══════╪═══════╣
║     0      ║ foo_1 │ bar_1 │ foo_1 │ bar_1 ║
╟────────────╫───────┼───────┼───────┼───────╢
║     1      ║ foo_2 │ bar_2 │ foo_2 │ bar_2 ║
╟────────────╫───────┼───────┼───────┼───────╢
║     2      ║ foo_3 │ bar_3 │ foo_3 │ bar_3 ║
╚════════════╩═══════╧═══════╧═══════╧═══════╝

Platform: Raspberry Pi 3b+, Python 3.6


Code

Currently, I am looking into different options for doing this, so do not have any coherent code which comes near working. However, in order of preference, here are a few options which I am considering:

  • Use pandas to form an array which mirrors the nature of the desired table. Then write this to CSV directly.

  • Write to CSV from the dictionary data-structure described above.

    import csv
    
    field_names = dict.keys()
    header2 = {'%s' %dict.keys() : dict[key_1][0].keys()}
    
    with open('project_data.csv', 'a') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=field_names)
        writer.writeheader()  # Write header row containing the top field names
        writer.writerow(header2)  #Write row containing the second field names
    

    As is evident, this code needs further development to make it work as desired.

  • Another method I have not considered?


The Question

What would be the best approach to write to a CSV in this format?

Triz
  • 173
  • 2
  • 10
  • 1
    I had solved a similar question here: https://stackoverflow.com/questions/52001109/arrays-to-row-in-pandas/52002365#52002365 – anky Sep 07 '18 at 09:33
  • @anky_91 thanks, that gives a great process for using pandas to do this. Any ideas on how to do this with multiple header rows though? This is what really has me stumped. – Triz Sep 07 '18 at 09:48

3 Answers3

2

so far I have reached here:

d = {'key_1':
    [{'key_a': 'foo_1', 'key_b': 'bar_1'}, 
     {'key_a': 'foo_2', 'key_b': 'bar_2'}], 
    'key_2':
    [{'key_c': 'foo_1', 'key_d': 'bar_1'}, 
     {'key_c': 'foo_2', 'key_d': 'bar_2'}]}
df = pd.DataFrame(d)
df1 = []
for col in df.columns:
    data = df[col].apply(pd.Series)
    data = df1.append(data)
df1 = pd.concat(df1,axis=1)
print(df1)

which gives you:

  key_a   key_b    key_c     key_d
0 foo_1   bar_1    foo_1     bar_1 
1 foo_2   bar_2    foo_2     bar_2 

Rest you have to map the respective keys in regards to your original column names and tat can be placed in df1.loc[-1] as an identifier. I will update once I get something.

anky
  • 74,114
  • 11
  • 41
  • 70
  • This works well, having looked into it further, I am not sure it is possible to create a second header row with merged cells, as desired. This is because CSV does not contain any cell-style info, so cannot merge cells. This means it may be necessary to form a new csv file for each of the top level headers (key_1, key_2, etc.). I'd be interested in any alternatives to this. – Triz Sep 07 '18 at 10:53
  • 1
    I am not sure how to format or merge csv cells, however I will try and see if I can come up with key1: key_a...etc mapping. :) – anky Sep 07 '18 at 10:55
1

Here your "List Index" is a merged cell which you can't achieve in a CSV file because it doesn't contain formatting. You can either:

a) Write it to a xlsx file (XlsxWriter is a great library for this)

b) Keep it was a CSV but with an unmerged cell as anky_91 suggests

Phillip Watts
  • 365
  • 3
  • 9
0

here's a solution for creating fieldnames for DictWriter() for a dictionary that contains dictionaries and also lists of dictionaries.

You need to walk the structure and generate the fieldnames along with a new dict that has those new names:

#!/usr/bin/env python3
import csv
import pprint as pp

myDict = {'key_1':
        [{'key_a': 'foo_1', 'key_b': 'bar_1'}, 
         {'key_a': 'foo_2', 'key_b': 'bar_2'}], 
        'key_2':
        [{'key_c': 'foo_1', 'key_d': 'bar_1'}, 
         {'key_c': 'foo_2', 'key_d': 'bar_2'}] }

def generateFieldnames(myDict):
    # create unique fieldnames from a dictionary containing dictionaries
    newDict={}
    fieldnames=[] # DictWriter will create a .csv with these header names
    
    for k,v in myDict.items():
        
        # is a dictionary?
        if (type(v) is dict):
            for kk,vv in v.items():
                print('k={0}, kk={1}, vv={2}'.format(k,kk,vv))
                name='{0}_{1}'.format(k,kk)
                fieldnames.append(name)
                newDict[name]=vv
                
        elif (type(v) is list):
            for item in range(len(v)):
                listItem=v.pop()
                if (type(listItem) is dict):
                    for kk,vv in listItem.items():
                        name='{0}_{1}'.format(k,kk)
                        fieldnames.append(name)
                        newDict[name]=vv
        
        else:
            print('k=[{0}] , v=[{1}]'.format(k,v))
            fieldnames.append(k)
            newDict[k]=v
    
    return fieldnames, newDict


# create fieldnames from the dictionary with lists and dictionaries
fieldnames, newDict=generateFieldnames(myDict)
pp.pprint(fieldnames)
print('\n')
pp.pprint(fieldnames)
print('\n\n')

# write a sample .csv with fieldnames as headers
fd = open('mytest.csv','a')
dw = csv.DictWriter( fd, fieldnames=fieldnames)

dw.writeheader() # write the header row

dw.writerow( newDict )
dw.writerow( newDict )
dw.writerow( newDict )

fd.close()

the result can be seen in the file mytest.csv:

key_1_key_a,key_1_key_b,key_1_key_a,key_1_key_b,key_2_key_c,key_2_key_d,key_2_key_c,key_2_key_d
foo_1,bar_1,foo_1,bar_1,foo_1,bar_1,foo_1,bar_1
foo_1,bar_1,foo_1,bar_1,foo_1,bar_1,foo_1,bar_1
foo_1,bar_1,foo_1,bar_1,foo_1,bar_1,foo_1,bar_1
Marc Compere
  • 290
  • 4
  • 17