2

I am used to R which offers quick functions to read CSV files column by column, can anyone propose a quick and efficient way to read large data (CSV for example) files in python? the ith column of a CSV file for example.

I have the following but it takes time :

    import os,csv, numpy, scipy
    from numpy import *
    f= open('some.csv', 'rb') 
    reader = csv.reader(f, delimiter=',')
    header = reader.next()
    zipped = zip(*reader)
    print( zipped[0] ) # is the first column

Is there a better way to read data (from large files) in python (at least as quick as R in terms of memory) ?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
DKK
  • 1,870
  • 4
  • 15
  • 22

2 Answers2

5

You can also use pandas.read_csv and its use_cols argument. See here

import pandas as pd

data = pd.read_csv('some.csv', use_cols = ['col_1', 'col_2', 'col_4'])
...
Community
  • 1
  • 1
Justin
  • 42,475
  • 9
  • 93
  • 111
2
import csv

with open('some.csv') as fin:
    reader = csv.reader(fin)
    first_col = [row[0] for row in reader]

What you're doing using zip is loading the entire file to memory, then transposing it to get the col. If you only want the column values, just include that in the list to start with.

If you wanted multiple columns, then you could do:

from operator import itemgetter
get_cols = itemgetter(1, 3, 5)
cols = map(get_cols, reader)
Jon Clements
  • 138,671
  • 33
  • 247
  • 280