1

Assume I have multiple lists of dict's, something along the lines of

list_one = [{'genre': 'Action', 'amount': 141, 'meanScore': 82}, {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

list_two = [{'genre': 'Horror', 'amount': 11, 'meanScore': 62}, {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

My goal is to write this to a file in the form

           Action       Comedy       Horror      
list_one  meanScore   meanScore    
           amount       amount       
list_two              meanScore     meanScore
                        amount       amount

I'm not really familiar with dict and what the best way is to store them, but it seems that csv- files are quite popular for that. I tried to use this answer here to solve my problem, but I'm having difficulty in understanding what @MarkLongair does and how you would expand that to my problem. One of the main things that concerns me is that not every genre is part of every list, so I don't know how to check in the existing csv file if the key exists, where it is located and how to write the value into the right column.

Since I couldn't really understand the linked answer I tried something along the lines of

from pandas import DataFrame

list_one = [{'genre': 'Action', 'amount': 141, 'meanScore': 82},
            {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

list_two = [{'genre': 'Horror', 'amount': 11, 'meanScore': 62}, 
            {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

DataFrame(list_one).to_csv('test.csv')
DataFrame(list_two).to_csv('test.csv')

which doesn't really work since the data get's overwritten and the things I wanted to be the columns get transformed to rows...

I'm not sure how to go on form here or what exactly the right direction is... Can somebody maybe help a bit?

  • The format you are after is not valid CSV though. – Selcuk Dec 20 '18 at 23:00
  • @Selcuk Thanks for the comment. This is because you can't leave some "cells" empty, right? In that case, which format would you suggest? – Marius Jaeger Dec 20 '18 at 23:03
  • Oh, that you work in pandas is very import information since pandas is different from Python's native libraries. – Falk Schuetzenmeister Dec 20 '18 at 23:15
  • @FalkSchuetzenmeister I just used pandas since it is the only tool I am even slightly familiar with when it comes to dealing with data... I'm happy to switch if that helps solving the problem! – Marius Jaeger Dec 20 '18 at 23:17
  • Is that format a must or does any human readable format do? – Selcuk Dec 20 '18 at 23:19
  • No it does not help, it is just another solution. But I think you should familiarize yourself with the Python standard libraries to use Pandas effectively. – Falk Schuetzenmeister Dec 20 '18 at 23:19
  • @Selcuk any format that can later be used to read the data in again will do. – Marius Jaeger Dec 20 '18 at 23:20
  • In that case using `yaml` is not a bad idea. It is relatively readable and also can be read back using the same library. See https://pyyaml.org/wiki/PyYAMLDocumentation – Selcuk Dec 20 '18 at 23:23

2 Answers2

1

One way to get around this without using Pandas [EDIT: I see you've since edited to mention this] is to make a function that looks at one of your dictionaries, and composes the appropriate line of CSV text.

def generate_row(separator, headers, data_type, data_list, list_name):
    data_by_genre = {k: '' for k in headers}
    for data in data_list:
        data_by_genre[data['genre']] = str(data[data_type])

    output_text = separator.join([data_by_genre[genre] for genre in headers]) + '\n'
    # If it's 'amount', then the row starts with the name. Otherwise that space is blank.
    if data_type == 'amount':
        output_text = list_name + output_text

    return output_text


list_one = [{'genre': 'Action', 'amount': 141, 'meanScore': 82}, {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]
list_two = [{'genre': 'Horror', 'amount': 11, 'meanScore': 62}, {'genre': 'Comedy', 'amount': 191, 'meanScore': 82}]

headers = ['', 'Action', 'Comedy', 'Horror']
separator = ','

f = open('new.csv', 'w')
f.write(separator.join(headers))
f.write('\n')
f.write(generate_row(separator, headers, 'amount', list_one, 'list_one'))
f.write(generate_row(separator, headers, 'meanScore', list_one, 'list_one'))
f.write(generate_row(separator, headers, 'amount', list_two, 'list_two'))
f.write(generate_row(separator, headers, 'meanScore', list_two, 'list_two'))
f.close()

I made "separator" a variable in case you want to use e.g. tab-delimited instead of commas.

If you want to use Pandas though, you can write something to reformat your data to look like this, so it writes "correctly".

data1 = [{'Action': 141, 'Comedy': 191, 'Horror': None},
         {'Action': 82, 'Comedy': 82, 'Horror': None},
         {'Action': None, 'Comedy': 191, 'Horror': 11},
         {'Action': None, 'Comedy': 82, 'Horror': 62}]

DataFrame(data1).to_csv('test.csv')
Bill M.
  • 1,388
  • 1
  • 8
  • 16
0

In the first version of your question, you did not mention that you are operating within pandas which is really distinct from the Python standard libraries and important information. Pandas is not really needed to do this but I assume that you are using pandas for other reasons.

DataFrame(list1 + list2).to_csv('test.csv')

See also

How to add pandas data to an existing csv file?

if you want to append while writing rather combining the lists before turning into dataframe.

Other solutions outside pandas would be csv.DictWriter in the csv library or JSON serialization (if CSV is not a requirement).

Falk Schuetzenmeister
  • 1,497
  • 1
  • 16
  • 35
  • I'm sorry but this doesn't really solve the problem. If you run the code you will see that you now have two rows with the entry comedy, instead of one column. – Marius Jaeger Dec 20 '18 at 23:23
  • 2
    Well Stackoverflow is not really the place where people do your work. You should look into the pandas documentation how to summarize on unique keys. – Falk Schuetzenmeister Dec 20 '18 at 23:27