2

I have a file 'data.csv' that looks something like

ColA, ColB, ColC
1,2,3
4,5,6
7,8,9

I want to open and read the file columns into lists, with the 1st entry of that list omitted, e.g.

dataA = [1,4,7]
dataB = [2,5,8]
dataC = [3,6,9]

In reality there are more than 3 columns and the lists are very long, this is just an example of the format. I've tried:

csv_file = open('data.csv','rb')
csv_array = []

for row in csv.reader(csv_file, delimiter=','):
    csv_array.append(row)

Where I would then allocate each index of csv_array to a list, e.g.

dataA = [int(i) for i in csv_array[0]]

But I'm getting errors:

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Also it feels like a very long winded way of just saving data to a few lists...

Thanks!

edit:

Here is how I solved it:

import pandas as pd

df = pd.read_csv('data.csv', names = ['ColA','ColB','ColC']

dataA = map(int,(df.ColA.tolist())[1:3])

and repeat for the rest of the columns.

rh1990
  • 880
  • 7
  • 17
  • 32
  • If you don't mind using imports, I would suggest using tablib which you can find here : http://docs.python-tablib.org/en/master/ – rak007 Oct 13 '17 at 14:50

6 Answers6

4

Just to spell this out for people trying to solve a similar problem, perhaps without Pandas, here's a simple refactoring with comments.

import csv

# Open the file in 'r' mode, not 'rb'
csv_file = open('data.csv','r')
dataA = []
dataB = []
dataC = []

# Read off and discard first line, to skip headers
csv_file.readline()

# Split columns while reading
for a, b, c in csv.reader(csv_file, delimiter=','):
    # Append each variable to a separate list
    dataA.append(a)
    dataB.append(b)
    dataC.append(c)

This does nothing to convert the individual fields to numbers (use append(int(a)) etc if you want that) but should hopefully be explicit and flexible enough to show you how to adapt this to new requirements.

tripleee
  • 175,061
  • 34
  • 275
  • 318
3

To skip the header, create your reader on a seperate line. Then to convert from a list of rows to a list of columns, use zip():

import csv

with open('data.csv', 'rb') as f_input:
    csv_input = csv.reader(f_input)
    header = next(csv_input)
    data = zip(*[map(int, row) for row in csv_input])

print data

Giving you:

[(1, 4, 7), (2, 5, 8), (3, 6, 9)]

So if needed:

dataA = data[0]
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
3

Use Pandas:

import pandas as pd

df = pd.DataFrame.from_csv(path)
rows = df.apply(lambda x: x.tolist(), axis=1)
Jurgy
  • 2,128
  • 1
  • 20
  • 33
  • 1
    Why use pandas when csv comes as `batteries included` ? – Thomas Junk Oct 13 '17 at 14:56
  • Deprecated since version 0.21.0: Use pandas.read_csv() instead. [source](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.from_csv.html) For example: `pd.read_csv(path)` – Andy Mar 29 '21 at 13:27
0

Seems like you have OSX line endings in your csv file. Try saving the csv file as "Windows Comma Separated (.csv)" format.

There are also easier ways to do what you're doing with the csv reader:

csv_array = []
with open('data.csv', 'r') as csv_file:
    reader = csv.reader(csv_file)
    # remove headers
    reader.next() 
    # loop over rows in the file, append them to your array. each row is already formatted as a list.
    for row in reader:
        csv_array.append(row)

You can then set dataA = csv_array[0]

pythomatic
  • 647
  • 4
  • 13
  • the file was exported from excel, it's just a plain .csv file as far as I can tell... – rh1990 Oct 13 '17 at 14:51
  • Ok this is almost working, but I'm getting the rows instead of the columns, i.e. `dataA = [1,2,3]` and I want `dataA=[1,4,7]` – rh1990 Oct 13 '17 at 14:59
  • Sorry for missing that part of the question. You can just use `zip` to transpose it: `csv_array = zip(csv_array[0], csv_array[1], csv_array[2]`. There might be easier ways, but this I think is the most verbose and easy to get. – pythomatic Oct 13 '17 at 15:03
0

First if you read the csv file with csv.reader(csv_file, delimiter=','), you will still read the header.

csv_array[0] will be the header row -> ['ColA', ' ColB', ' ColC']

Also if you're using mac, this issues is already referenced here: CSV new-line character seen in unquoted field error

And I would recommend using pandas&numpy instead if you will do more analysis using the data. It read the csv file to pandas dataframe. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Clmelissa
  • 53
  • 7
0

use csv.DictReader() to select specific columns

dataA = []
dataB = []
with open('data.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')    
    for row in csv_reader:
        dataA.append(row['ColA'])
        dataB.append(row['ColB'])
Ajay
  • 184
  • 1
  • 7