2

I'm using this answer on how to read only a chunk of CSV file with pandas.

The suggestion to use pd.read_csv('./input/test.csv' , iterator=True, chunksize=1000) works excellent but it returns a <class 'pandas.io.parsers.TextFileReader'>, so I'm converting it to dataframe with pd.concat(pd.read_csv('./input/test.csv' , iterator=True, chunksize=25)) but that takes as much time as reading the file in the first place!

Any suggestions on how to read only a chunk of the file fast?

CIsForCookies
  • 12,097
  • 11
  • 59
  • 124

1 Answers1

5

pd.read_csv('./input/test.csv', iterator=True, chunksize=1000) returns an iterator. You can use the next function to grab the next one

reader = pd.read_csv('./input/test.csv', iterator=True, chunksize=1000)

next(reader)

This is often used in a for loop for processing one chunk at a time.

for df in pd.read_csv('./input/test.csv', iterator=True, chunksize=1000):
    pass 
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Trying to convert the iterator to a dataframe using `pd.concat` forced it to read the whole file? – CIsForCookies May 22 '18 at 17:28
  • 1
    Yes. It also highlights that you can pass iterators to `pd.concat` which is handy to know. Using `next` on the iterator limits the reading to one chuck at a time. – piRSquared May 22 '18 at 17:30