I have large CSV files containing more than 315 million rows and a single column. I have to process more than 50 such files at a time to get the results.
As I read more than 10 using a csv reader, it takes more than 12GB of RAM and is painfully slow. I can read only a chunk of the file to save memory but would spend more time in reading the file as it will read the whole file every time.
I have thought about loading them into a database and querying the data from there. However, I am not sure if this approach would help in any way. Can anyone please tell which is the most efficient way of handling such scenarios in Python?