Pandas: read_csv() multiple files chunkwise

Asked Aug 14 '18 at 11:35

Active Aug 14 '18 at 14:35

Viewed 201 times

I have a directory with lots of large csv-files, each formatted the same way. Each file is too large to be imported into memory.

My problem is that pandas.read_csv() only lets me read one file at a time, I want pandas.read_csv() to treat all the files in the directory as one big file (that means, I want pandas to treat them as if the files were joined end-to-end). I do this such that I can read chunkwise from the files seamlessly. How can I do this most effectively? Performance is very important since the files are so large.

EDIT: I want to read to be treated as a single file because each chunk must have the same size, and also divisible by the total size of all the files (and not the individual file size)

edited Aug 14 '18 at 14:35

asked Aug 14 '18 at 11:35

Mikkel Rev

1

chunksize is a pandas.read_csv() parameter. Please go through the documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html – Bikash Ranjan Bhoi Aug 14 '18 at 11:39
Why do you want to treat them as one big file, I would rather use `pd.read_csv(filename, chunksize=some size)` to process them. – quest Aug 14 '18 at 11:39
Because I must have each chunk same size, and also divisible by the total size of all the files (and not the individual file size) – Mikkel Rev Aug 14 '18 at 14:31
I think [this](https://stackoverflow.com/a/46310416/3944322) is what you're looking for (making a file-like object from your csv files to be used with read_csv). – Stef Aug 14 '18 at 16:27

Pandas: read_csv() multiple files chunkwise

0 Answers0