I have a dataframe compose of 25 col and ~1M rows, split into 12 files, now I need to import them and then use some reshape
package to do some data management. Each file is too large that I have to look for some "non-RAM" solution for importing and data processing, current I don't need to do any regression, I will have some descriptive statistics about the dataframe only.
I searched a bit and found two packages: ff
and filehash
, I read filehash
manual first and found that it seems simple, just added some code on importing the dataframe into a file, the rest seems to be similar as usual R operations.
I haven't tried ff
yet, as it comes with lots of different class, and I wonder if it worth investing time for understanding ff
itself before my real work begins. But filehash
package seems to be static for sometime and there's little discussion about this package, I wonder if filehash
has become less popular, or even become obsolete.
Can anyone help me to choose which package to use? Or can anyone tell me what is the difference/ pros-and-cons between them? Thanks.
update 01
I am currently using filehash
for importing the dataframe, and realize that it dataframe imported using filehash
should be considered as readonly, as all the further modification in that dataframe will not be stored back to the file, unless you save it again, which is not very convenient in my view, as I need to remind myself to do the saving. Any comment on this?