What is the best way to read the ith column of a csv file with Python?

Question

I am used to R which offers quick functions to read CSV files column by column, can anyone propose a quick and efficient way to read large data (CSV for example) files in python? the i^th column of a CSV file for example.

I have the following but it takes time :

    import os,csv, numpy, scipy
    from numpy import *
    f= open('some.csv', 'rb') 
    reader = csv.reader(f, delimiter=',')
    header = reader.next()
    zipped = zip(*reader)
    print( zipped[0] ) # is the first column

Is there a better way to read data (from large files) in python (at least as quick as R in terms of memory) ?

score 5 · Answer 1 · edited May 23 '17 at 11:44

5

You can also use pandas.read_csv and its use_cols argument. See here

import pandas as pd

data = pd.read_csv('some.csv', use_cols = ['col_1', 'col_2', 'col_4'])
...

edited May 23 '17 at 11:44

Community

1
1

answered May 31 '13 at 18:35

Justin

42,475
9
93
111

score 2 · Answer 2 · answered May 31 '13 at 18:26

import csv

with open('some.csv') as fin:
    reader = csv.reader(fin)
    first_col = [row[0] for row in reader]

What you're doing using zip is loading the entire file to memory, then transposing it to get the col. If you only want the column values, just include that in the list to start with.

If you wanted multiple columns, then you could do:

from operator import itemgetter
get_cols = itemgetter(1, 3, 5)
cols = map(get_cols, reader)

What is the best way to read the ith column of a csv file with Python?

2 Answers2