I have a 4 Gb csv file to load in my 16Gb machine, fread
and read.csv
can't load it at once, they return memory errors.
So I decided to read the file by chunks, and it worked (after one hour or so), and I get a list of data.frames
that takes 2.5 Gb if I trust the Environment tab in RStudio
, and 1.2 Gb when saved as an RDS.
The issue I have now is concatenating everything back into a big data.frame
. from what I understand rbindlist
is the most efficient solution (or is it bind_rows
?), but in my case it still uses too much memory.
I think I can solve this by using rbindlist
on list items n
by n
, then recursively up to when I get my final list. This n
number would have to be calibrated manually though and this process is really ugly (on top of this annoying csv importation).
Another idea that crossed my mind is to find a way to feed an SQLite database from my loaded data, and then query it from R (I'll only do subset
, min
and max
operations on the data).
Can I do better than this ?
My data is only made of integer
and double
, if it makes a difference.