How to write dict in certain format to file

Question

Assume I have multiple lists of dict's, something along the lines of

list_one = [{'genre': 'Action', 'amount': 141, 'meanScore': 82}, {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

list_two = [{'genre': 'Horror', 'amount': 11, 'meanScore': 62}, {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

My goal is to write this to a file in the form

           Action       Comedy       Horror      
list_one  meanScore   meanScore    
           amount       amount       
list_two              meanScore     meanScore
                        amount       amount

I'm not really familiar with dict and what the best way is to store them, but it seems that csv- files are quite popular for that. I tried to use this answer here to solve my problem, but I'm having difficulty in understanding what @MarkLongair does and how you would expand that to my problem. One of the main things that concerns me is that not every genre is part of every list, so I don't know how to check in the existing csv file if the key exists, where it is located and how to write the value into the right column.

Since I couldn't really understand the linked answer I tried something along the lines of

from pandas import DataFrame

list_one = [{'genre': 'Action', 'amount': 141, 'meanScore': 82},
            {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

list_two = [{'genre': 'Horror', 'amount': 11, 'meanScore': 62}, 
            {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

DataFrame(list_one).to_csv('test.csv')
DataFrame(list_two).to_csv('test.csv')

which doesn't really work since the data get's overwritten and the things I wanted to be the columns get transformed to rows...

I'm not sure how to go on form here or what exactly the right direction is... Can somebody maybe help a bit?

@Selcuk Thanks for the comment. This is because you can't leave some "cells" empty, right? In that case, which format would you suggest? — Marius Jaeger, Dec 20 '18 at 23:03
Oh, that you work in pandas is very import information since pandas is different from Python's native libraries. — Falk Schuetzenmeister, Dec 20 '18 at 23:15
@FalkSchuetzenmeister I just used pandas since it is the only tool I am even slightly familiar with when it comes to dealing with data... I'm happy to switch if that helps solving the problem! — Marius Jaeger, Dec 20 '18 at 23:17
No it does not help, it is just another solution. But I think you should familiarize yourself with the Python standard libraries to use Pandas effectively. — Falk Schuetzenmeister, Dec 20 '18 at 23:19
@Selcuk any format that can later be used to read the data in again will do. — Marius Jaeger, Dec 20 '18 at 23:20
In that case using `yaml` is not a bad idea. It is relatively readable and also can be read back using the same library. See https://pyyaml.org/wiki/PyYAMLDocumentation — Selcuk, Dec 20 '18 at 23:23

Bill M. · Accepted Answer · 2018-12-20T23:50:18.653

One way to get around this without using Pandas [EDIT: I see you've since edited to mention this] is to make a function that looks at one of your dictionaries, and composes the appropriate line of CSV text.

def generate_row(separator, headers, data_type, data_list, list_name):
    data_by_genre = {k: '' for k in headers}
    for data in data_list:
        data_by_genre[data['genre']] = str(data[data_type])

    output_text = separator.join([data_by_genre[genre] for genre in headers]) + '\n'
    # If it's 'amount', then the row starts with the name. Otherwise that space is blank.
    if data_type == 'amount':
        output_text = list_name + output_text

    return output_text


list_one = [{'genre': 'Action', 'amount': 141, 'meanScore': 82}, {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]
list_two = [{'genre': 'Horror', 'amount': 11, 'meanScore': 62}, {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

headers = ['', 'Action', 'Comedy', 'Horror']
separator = ','

f = open('new.csv', 'w')
f.write(separator.join(headers))
f.write('\n')
f.write(generate_row(separator, headers, 'amount', list_one, 'list_one'))
f.write(generate_row(separator, headers, 'meanScore', list_one, 'list_one'))
f.write(generate_row(separator, headers, 'amount', list_two, 'list_two'))
f.write(generate_row(separator, headers, 'meanScore', list_two, 'list_two'))
f.close()

I made "separator" a variable in case you want to use e.g. tab-delimited instead of commas.

If you want to use Pandas though, you can write something to reformat your data to look like this, so it writes "correctly".

data1 = [{'Action': 141, 'Comedy': 191, 'Horror': None},
         {'Action': 82, 'Comedy': 82, 'Horror': None},
         {'Action': None, 'Comedy': 191, 'Horror': 11},
         {'Action': None, 'Comedy': 82, 'Horror': 62}]

DataFrame(data1).to_csv('test.csv')

Thank you very much! This looks really nice! I'm gonna take a closer look at it tomorrow and ask if there are any questions! — Marius Jaeger, Dec 20 '18 at 23:41

Falk Schuetzenmeister · Answer 2 · 2018-12-20T23:24:17.657

0

In the first version of your question, you did not mention that you are operating within pandas which is really distinct from the Python standard libraries and important information. Pandas is not really needed to do this but I assume that you are using pandas for other reasons.

DataFrame(list1 + list2).to_csv('test.csv')

How to write dict in certain format to file

2 Answers2