4

I am trying to create a function that accepts the name of a .csv data file and a list of strings representing column headings in that file and return a dict object with each key being a column heading and the corresponding value being a numpy array of the values in that column of the data file.

My code right now:

def columndata(filename, columns):
d = dict()
for col in columns:
with open(filename) as filein:
    reader = csv.reader(filein)
        for row in reader:
           if col in row:
               d.append(row)
return d

The sample CSV looks like:

test1,test2
3,2
1,5
6,47
1,4

The columns file looks like:

cols = ['test1', 'test2']

The end result should be a dictionary like this:

{'test1':[3,1,6,1], 'test2':[2, 5, 4, 4]}

2 Answers2

8

You can use a DictReader which parse the CSV data into a dict:

import csv
from collections import defaultdict


def parse_csv_by_field(filename, fieldnames):
    d = defaultdict(list)
    with open(filename, newline='') as csvfile:
        reader = csv.DictReader(csvfile, fieldnames)
        next(reader)  # remove header
        for row in reader:
            for field in fieldnames:
                d[field].append(float(row[field]))  # thanks to Paulo!
    return dict(d)

print(parse_csv_by_field('a.csv', fieldnames=['cattle', 'cost']))
TwistedSim
  • 1,960
  • 9
  • 23
4

A simple pandas solution:

import pandas as pd
df = pd.read_csv('filename', dtype='float') #you wanted float datatype
dict = df.to_dict(orient='list')

If you want to stick with regular python:

import csv
with open(filename, 'r') as f:
    l = list(csv.reader(f))
    dict = {i[0]:[float(x) for x in i[1:]] for i in zip(*l)}

Or if you're a master of being pythonic like Adam Smith:

import csv
with open(filename, 'r') as f:
    l = list(csv.reader(f))
    dict = {header: list(map(float, values)) for header, *values in zip(*l)}
Primusa
  • 13,136
  • 3
  • 33
  • 53
  • In `pandas`, may be `.to_dict(orient='list')` will give desired result – niraj Apr 17 '18 at 00:31
  • Also, I think it is good to name `dict` to some other variable name `my_dict`. If you try: `my_dict = {i[0]:list(i[1:]) for i in zip(*l)}`, it will give list instead of tuple for values in `my_dict` – niraj Apr 17 '18 at 00:36
  • Good use of the `cols = zip(*rows)` idiom. Might I suggest `my_dict = {header: values for header, *values in zip(*l)}`? (Valid only in py3) – Adam Smith Apr 17 '18 at 00:38
  • @0p3n5ourcE I don't get that as a result, but you can convert it into a list while converting to floats – Primusa Apr 17 '18 at 00:39
  • @Primusa you can preserve the `float` cast by either replacing `header: values` with `header: map(float, values)` or `header: [float(v) for v in values]` – Adam Smith Apr 17 '18 at 00:44
  • this solution doesn't work with 16GB files..... looking for a way to do that without pre-loading the entire file. – JasonGenX Feb 11 '20 at 00:39