Extracting variable names and data from csv file using Python

Question

I have a csv file that has each line formatted with the line name followed by 11 pieces of data. Here is an example of a line.

CW1,0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64

There are 12 lines in total, each with a unique name and data.

What I would like to do is extract the first cell from each line and use that to name the corresponding data, either as a variable equal to a list containing that line's data, or maybe as a dictionary, with the first cell being the key.

I am new to working with inputting files, so the farthest I have gotten is to read the file in using the stock solution in the documentation

import csv

path = r'data.csv'

with open(path,'rb') as csvFile:
    reader = csv.reader(csvFile,delimiter=' ')
    for row in reader:
        print(row[0])

I am failing to figure out how to assign each row to a new variable, especially when I am not sure what the variable names will be (this is because the csv file will be created by a user other than myself).

The destination for this data is a tool that I have written. It accepts lists as input such as...

CW1 = [0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64]

so this would be the ideal end solution. If it is easier, and considered better to have the output of the file read be in another format, I can certainly re-write my tool to work with that data type.

NDevox · Answer 1 · 2015-06-08T15:03:02.750

You need to use a dict for these kinds of things (dynamic variables):

import csv

path = r'data.csv'

data = {}

with open(path,'rb') as csvFile:
    reader = csv.reader(csvFile,delimiter=' ')
    for row in reader:
        data[row[0]] = row[1:]

dicts are especially useful for dynamic variables and are the best method to store things like this. to access you just need to use:

data['CW1']

This solution also means that if you add any extra rows in with new names, you won't have to change anything.

If you are desperate to have the variable names in the global namespace and not within a dict, use exec (N.B. IF ANY OF THIS USES INPUT FROM OUTSIDE SOURCES, USING EXEC/EVAL CAN BE HIGHLY DANGEROUS (rm * level) SO MAKE SURE ALL INPUT IS CONTROLLED AND UNDERSTOOD BY YOURSELF).

with open(path,'rb') as csvFile:
    reader = csv.reader(csvFile,delimiter=' ')
    for row in reader:
        exec("{} = {}".format(row[0], row[1:])

Thanks for the good insight. I would upvote if I had the necessary rep. — regularGuy, Jun 08 '15 at 15:12

Rick · Accepted Answer · 2015-06-08T15:10:16.757

As Scironic said in their answer, it is best to use a dict for this sort of thing.

However, be aware that dict objects do not have any "order" - the order of the rows will be lost if you use one. If this is a problem, you can use an OrderedDict instead (which is just what it sounds like: a dict that "remembers" the order of its contents):

import csv
from collections import OrderedDict as od

data = od() # ordered dict object remembers the order in the csv file

with open(path,'rb') as csvFile:
    reader = csv.reader(csvFile, delimiter = ' ')
    for row in reader:
        data[row[0]] = row[1:] # Slice the row up into 0 (first item) and 1: (remaining)

Now if you go looping through your data object, the contents will be in the same order as in the csv file:

for d in data.values():
    myspecialtool(*d)

Thank you for catching the ordering issue. That definitely saved me some headache. — regularGuy, Jun 08 '15 at 15:12

score 0 · Answer 3 · edited May 23 '17 at 12:14

In python, you can use slicing: row[1:] will contain the row, except the first element, so you could do:

>>> d={}
>>> with open("f") as f:
...  c = csv.reader(f, delimiter=',')
...  for r in c:
...    d[r[0]]=map(int,r[1:])
...
>>> d
{'var1': [1, 3, 1], 'var2': [3, 0, -1]}

Regarding variable variables, check How do I do variable variables in Python? or How to get a variable name as a string in Python?. I would stick to dictionary though.

score 0 · Answer 4 · answered Jun 08 '15 at 15:57

An alternative to using the proper csv library could be as follows:

path = r'data.csv'
csvRows = open(path, "r").readlines()

dataRows = [[float(col) for col in row.rstrip("\n").split(",")[1:]] for row in csvRows]

for dataRow in dataRows:        # Where dataRow is a list of numbers
    print dataRow

You could then call your function where the print statement is.

This reads the whole file in and produces a list of lines with trailing newlines. It then removes each newline and splits each row into a list of strings. It skips the initial column and calls float() for each entry. Resulting in a list of lists. It depends how important the first column is?

Extracting variable names and data from csv file using Python

4 Answers4