5

I have a list of dictionaries that I want to be able to open in Excel, formatted correctly. This is what I have so far, using csv:

list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(ipath, 'wb')

writer = csv.writer(ofile, dialect = 'excel')

for items in list_of_dicts:
    for k,v in items.items():
        writer.writerow([k,v])

Obviously, when I open the output in Excel, it's formatted like this:

key  value
key  value

What I want is this:

key   key   key

value value value

I can't figure out how to do this, so help would be appreciated. Also, I want the column names to be the dictionary keys, in stead of the default 'A, B, C' etc. Sorry if this is stupid.

Thanks

Yngve
  • 743
  • 4
  • 10
  • 15
  • possible duplicate of [Python Dictionary to CSV](http://stackoverflow.com/questions/8331469/python-dictionary-to-csv) – user Sep 19 '13 at 17:04
  • 3
    @user: I'm pretty sure this isn't a duplicate; this question wants to transpose rows and columns from how it's usually done, and from a brief look I don't think that other question covers that. – icktoofay Sep 20 '13 at 02:48

3 Answers3

6

The csv module has a DictWriter class for this, which is covered quite nicely in another SO answer. The critical point is that you need to know all your column headings when you instantiate the DictWriter. You could construct the list of field names from your list_of_dicts, if so your code becomes

list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(out_path, 'wb')

fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')

writer.writeheader() # Assumes Python >= 2.7
for row in list_of_dicts:
    writer.writerow(row)
out_file.close()

The way I've constructed fieldnames scans the entire list_of_dicts, so it will slow down as the size increases. You should instead construct fieldnames directly from the source of your data e.g. if the source of your data is also a csv file you can use a DictReader and use fieldnames = reader.fieldnames.

You can also replace the for loop with a single call to writer.writerows(list_of_dicts) and use a with block to handle file closure, in which case your code would become

list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"

fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))

with open(out_path, 'wb') as out_file:
    writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
    writer.writeheader()
    writer.writerows(list_of_dicts)
Community
  • 1
  • 1
Alex Willmer
  • 491
  • 4
  • 14
  • Thanks. I tried your method, and it worked except for one thing, the first key-value pair turned out fine, but the in the next one, the value skipped a row. When i added more dictionaries to the test list, this pattern was clear: for each dictionary, the next value skipped one more row than the previous one. It's because too many commas were added, but my understanding of it is too poor to figure out why it happens. (This is referring to your first method. I just saw you had another one. I might try that one) – Yngve Dec 01 '12 at 16:30
  • Hmm, I think I've interpreted your question, slightly differently to what you had in mind. If `len(list_of_dicts) == 5`, how many rows are you expecting to write in total? Does each dictionary in `list_of_dicts` _always_ have a single key/value? – Alex Willmer Dec 01 '12 at 17:32
  • Nevermind, I see from the accepted answer and the subject that the answers are 2 rows, and yes - always a single key/value. I answered assuming n, and no. – Alex Willmer Dec 01 '12 at 17:40
  • Sorry, I was unclear. Martijns solution worked well on the 'test data' in the question, so I assumed it would work for the actual data as well. But when I tried your method on the actual data, with multiple key/values, it worked perfectly, for some reason. So thanks! – Yngve Dec 01 '12 at 21:00
2

You need to write 2 separate rows, one with the keys, one with the values, instead:

writer = csv.writer(ofile, dialect = 'excel')

writer.writerow([k for d in list_of_dicts k in d])
writer.writerow([v for d in list_of_dicts v in d.itervalues()])

The two list comprehensions extract first all the keys, then all the values, from the dictionaries in your input list, combining these into one list to write to the CSV file.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

I think that the most useful is to write the column by column, so each key is a column (good for later on data processing and use for e.g. ML).

I had some trouble yesterday figuring it out but I came up with the solution I saw on some other website. However, from what I see it is not possible to go through the whole dictionary at once and we have to divide it on smaller dictionaries (my csv file had 20k rows at the end - surveyed person, their data and answers. I did it like this:

    # writing dict to csv
    # 'cleaned' is a name of the output file 
    
    # 1 header 
    # fildnames is going to be columns names 
    
    # 2 create writer 
    writer = csv.DictWriter(cleaned, d.keys())
    
    # 3 attach header 
    writer.writeheader()
    
    # write separate dictionarties 
    for i in range(len(list(d.values())[0])):
        
        writer.writerow({key:d[key][i] for key in d.keys()}) 

I see my solution has one more for loop but from the other hand, I think it takes less memory (but, I am not sure!!) Hope it'd help somebody ;)

Maciek Woźniak
  • 346
  • 2
  • 10