how to read only a chunk of csv file fast?

Question

I'm using this answer on how to read only a chunk of CSV file with pandas.

The suggestion to use pd.read_csv('./input/test.csv' , iterator=True, chunksize=1000) works excellent but it returns a <class 'pandas.io.parsers.TextFileReader'>, so I'm converting it to dataframe with pd.concat(pd.read_csv('./input/test.csv' , iterator=True, chunksize=25)) but that takes as much time as reading the file in the first place!

Any suggestions on how to read only a chunk of the file fast?

piRSquared · Accepted Answer · 2018-05-22T17:31:42.273

5

pd.read_csv('./input/test.csv', iterator=True, chunksize=1000) returns an iterator. You can use the next function to grab the next one

reader = pd.read_csv('./input/test.csv', iterator=True, chunksize=1000)

next(reader)

This is often used in a for loop for processing one chunk at a time.

for df in pd.read_csv('./input/test.csv', iterator=True, chunksize=1000):
    pass

edited May 22 '18 at 17:31

answered May 22 '18 at 17:24

piRSquared

285,575
57
475
624

Trying to convert the iterator to a dataframe using `pd.concat` forced it to read the whole file? – CIsForCookies May 22 '18 at 17:28
1

Yes. It also highlights that you can pass iterators to `pd.concat` which is handy to know. Using `next` on the iterator limits the reading to one chuck at a time. – piRSquared May 22 '18 at 17:30

how to read only a chunk of csv file fast?

1 Answers1

Linked