6

Suppose I have the list of dictionary dataset like this,

data_set = [
    {'Active rate': [0.98, 0.97, 0.96]},
    {'Operating Expense': [3.104, 3.102, 3.101]}
]

I need to iterate the list of dictionary and put the keys as column headers and its values as the rows and write it to the CSV file.

Active rate    Operating Expense
0.98           3.104
0.97           3.102
0.96           3.101

This is what I tried

data_set = [
    {'Active rate': [0.98, 0.931588, 0.941192]},
    {'Operating Expense': [3.104, 2.352, 2.304]}
]

import csv

with open('names.csv', 'w') as csvfile:
    fieldnames = ['Active rate', 'Operating Expense']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'Active rate': 0.98, 'Operating Expense': 3.102})
    writer.writerow({'Active rate': 0.97, 'Operating Expense': 3.11})
    writer.writerow({'Active rate': 0.96, 'Operating Expense': 3.109})

For brevity, I have reduced the keys to 2 and list of values to 3.

How to approach this problem?

Thanks

PyAn
  • 291
  • 4
  • 14

5 Answers5

3

The following approach should work for the data structure you have given:

import csv

data_set = [
    {'Active rate': [0.98, 0.97, 0.96]},
    {'Operating Expense': [3.104, 3.102, 3.101]}
]

fieldnames = ['Active rate', 'Operating Expense']
rows = []

for field in fieldnames:
    for data in data_set:
        try:
            rows.append(data[field])
            break
        except KeyError, e:
            pass

with open('names.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(fieldnames)
    csv_output.writerows(zip(*rows))

Giving you the following CSV output file:

Active rate,Operating Expense
0.98,3.104
0.97,3.102
0.96,3.101
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
3
d1 = {'Active rate': [0.98, 0.931588, 0.941192]}
d2 = {'Operating Expense': [3.104, 2.352, 2.304]}

with open('names.csv', 'w') as csvfile:
    fieldnames = zip(d1, d2)[0]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    for row in zip(d1['Active rate'], d2['Operating Expense']):
        writer.writerow(dict(zip(fieldnames, row)))

For performance, you might want to use itertools.izip over zip depending on the length of lists.

Ozgur Vatansever
  • 49,246
  • 17
  • 84
  • 119
  • 1
    `itertools.izip` is indeed a good suggestion for larger datasets. (It would probably cause problems for replacing `zip(d1, d2)[0]` because of the index, but that shouldn't be hard to fix. – Bruno Nov 27 '15 at 10:52
3

(This answer has the disadvantage of using an external library, but)

pandas already provides extraordinarily powerful and simple tools for dealing with csv files. You can use to_csv.

Note your data structure is structured awkwardly, so we first transform it to a more intuitive structure

data_set2 = { x.keys()[0] : x.values()[0] for x in data_set }

import pandas as pd
df = pd.DataFrame(data_set2)
df.to_csv('names.csv', index = False)
Community
  • 1
  • 1
shx2
  • 61,779
  • 13
  • 130
  • 153
2
data_set = [
    {'Active rate': [0.98, 0.97, 0.96]},
    {'Operating Expense': [3.104, 3.102, 3.101]}
]

Firstly, just a quick comment, your initial data structure doesn't necessarily make sense as it is. You're using a list of dicts, but each dict seems to be using only one key, which seems to defeat its purpose.

Other data structures that would make more sense would be something like this (where each dict structure is used, as you currently have it, for one label/value pair, but at least the dict is used to tell the label and the value):

data_set = [
    {'label': 'Active rate', 'values': [0.98, 0.97, 0.96]},
    {'label': 'Operating Expense', 'values': [3.104, 3.102, 3.101]}
]

or, possibly better, an OrderedDict that give you both the order of your initial data set and the key/value mapping benefits:

from collections import OrderedDict
data_set = OrderedDict()
data_set['Active rate'] = [0.98, 0.97, 0.96]
data_set['Operating Expense'] = [3.104, 3.102, 3.101]

Of course, we don't always choose the data structures we get, so let's assume you can't change it. Your question then becomes a problem of swapping the roles of rows and columns from your initial dataset. Effectively, you want to iterate through multiple lists at the same time, and for this, zip is very useful.

import csv

fieldnames = []
val_lists = []
for d in data_set:
    # Find the only used key.
    # This is a bit awkward because of the initial data structure.
    k = d.keys()[0]
    fieldnames.append(k)
    val_lists.append(d[k])

with open('names.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)    
    writer.writerow(fieldnames)

    for row in zip(*val_lists):
        # This picks one item from each list and builds a list.
        # The first row will be [0.98, 3.104]
        # The second row will be [0.97, 3.102]
        # ...
        writer.writerow(row)

Note that there is no need for a DictWriter when you're using zip, since that would mean you need to rebuild a dict without any real benefit.

Bruno
  • 119,590
  • 31
  • 270
  • 376
  • You could also do the last part more concisely using `writerows(zip(*val_lists))` as @MartinEvans suggested. – Bruno Nov 27 '15 at 10:37
2

This code will help you without being tied to a certain number of dicts inside data_set

I've added another dict with 'Losses' key, to test

import csv

data_set = [
    {'Active rate': [0.98, 0.97, 0.96]},
    {'Operating Expense': [3.104, 3.102, 3.101]},
    {'Losses': [1.14, 2.28, 3.42]}
]

headers = [d.keys()[0] for d in data_set]

with open('names.csv', 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=headers)
    writer.writeheader()
    for item in zip(*[x.values()[0] for x in data_set]):
        more_results = list()
        more_results.append(headers)
        more_results.append(item)
        writer.writerow(dict(zip(*more_results)))

Output:

enter image description here

Andrés Pérez-Albela H.
  • 4,003
  • 1
  • 18
  • 29
  • 1
    "*This code will help you without being tied to a certain number of dicts inside data_set*": I think most other answers were already taking this into consideration. – Bruno Nov 27 '15 at 10:57
  • @Bruno I don't write code based on other answers. Thanks for your suggestion anyway. – Andrés Pérez-Albela H. Nov 27 '15 at 10:58
  • Can you make it Python 3 compatible? – PyAn Nov 27 '15 at 10:59
  • @AndrésPérez-AlbelaH, sorry, I certainly did not imply plagiarism, just saying that particular point (other dicts in `data_set`) was already handled in other answers. Your answer mainly differs from others in that it still uses a `DictWriter`. I'm not sure it's worth the trouble of rebuilding a dict and using a `DictWriter` at all, it just seems more work. – Bruno Nov 27 '15 at 11:01
  • @AndrésPérez-AlbelaH. 'zip' object is not subscriptable – PyAn Nov 27 '15 at 11:04
  • @PyAn ok, let me see. – Andrés Pérez-Albela H. Nov 27 '15 at 11:08
  • @PyAn I wouldn't necessarily use this answer because the whole stuff with `more_results` and `DictWriter` is a bit overkill, but `headers = [ d.keys()[0] for d in data_set ]` should be a suitable replacement. You'll find Python 3 uses `izip` anyway, so that's [why you can't use an index](http://stackoverflow.com/a/4989890/372643). – Bruno Nov 27 '15 at 11:11
  • 1
    @Bruno maybe you can help me modify my answer instead of encouraging him not to select my answer. – Andrés Pérez-Albela H. Nov 27 '15 at 11:13
  • @AndrésPérez-AlbelaH I've already written an answer that does exactly what you're suggesting without a `dict`. Sorry, I'm not picking on your answer (and my previous comment does suggest how to improve your answer for Python 3), it's just that it came much later than the other answers, differing only from Martin's and my answer by the fact it's using `dict`s and `DictWriter`, which I think just makes things more complicated. I'm not sure whether you or the OP realise that (but it's clear someone wants to start revenge downvoting...) – Bruno Nov 27 '15 at 11:16
  • To clarify the little point I'm making, you could get this much shorter (and without the need to rebuild a dict) like so: `writer = csv.writer(csvfile); writer.writerow(headers); writer.writerows(zip(*[x.values()[0] for x in data_set]))`. Creating `more_results`, then turning it into a dict with `dict(zip(*more_results))`, which is only required because a `DictWriter` is used seems convoluted, and can be avoided by using a normal `writer` in the first place. That's all I'm saying. – Bruno Nov 27 '15 at 11:25