3

In R usually data is loaded in RAM. Are there any packages which load the data in disk rather than RAM

Iterator
  • 20,250
  • 12
  • 75
  • 111
jan5
  • 1,129
  • 3
  • 17
  • 28
  • In addition to Iterator's and Dirk's answer, for handling large data, there are sqldf, RMySQL, RSQLite (also noted in the HPC TV). – Roman Luštrik Feb 24 '12 at 14:52

2 Answers2

8

Check out the bigmemory package, along with related packages like bigtabulate, bigalgebra, biganalytics, and more. There's also ff, though I don't find it as user-friendly as the bigmemory suite. The bigmemory suite was reportedly partially motivated by the difficulty of using ff. I like it because it required very few changes to my code to be able to access a bigmatrix object: it can be manipulated in almost exactly the same ways as a standard matrix, so my code is very reusable.

There's also support for HDF5 via NetCDF4, in packages like RNetCDF and ncdf. This is a popular, multi-platform, multi-language method for efficient storage and access of large data sets.

If you want basic memory mapping functionality, look at the mmap package.

Iterator
  • 20,250
  • 12
  • 75
  • 111
  • 1
    Bigmemory started as just an external pointer to objects in ram outside of R, plus proper semantics. The file-based stuff came in response to ff, but that didn't start bigmemory. You pointers to hdf5 and netcdf are good and correct too, as is the hint to mmap. – Dirk Eddelbuettel Feb 24 '12 at 14:28
3

Yes, the ff package can do that.

You may want to look at the Task View on High-Performance Computing for more details.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725