I have a directory with lots of large csv-files, each formatted the same way. Each file is too large to be imported into memory.
My problem is that pandas.read_csv()
only lets me read one file at a time, I want pandas.read_csv()
to treat all the files in the directory as one big file (that means, I want pandas to treat them as if the files were joined end-to-end). I do this such that I can read chunkwise from the files seamlessly. How can I do this most effectively? Performance is very important since the files are so large.
EDIT: I want to read to be treated as a single file because each chunk must have the same size, and also divisible by the total size of all the files (and not the individual file size)