O/S has a principal limit on RAM
-addressing ( smaller for a 32-bit system, larger for 64-bit system )
O/S next has a design-based limit for a max RAM
a process can allocate ( +kill
-s afterwards )
Had the same InRAM
constraints in python and went beyond that
Sure, at some cost, but was a worth piece of experience.
python numpy
has a wonderfull feature for this very scenario seamlessly inbuilt - a .memmap()
. The word seamlessly is intentionally emphasised, as this is of the core importance for your problem re-formulation / re-design costs. There are tools available, but it will be your time to master 'em and to re-design your algoritm ( libraries et al ) so as these can use the new tools - guess what - SEAMLESSLY. This is the hidden part of the iceberg.
Handy R
tools available:
filebacked.big.matrix
which also supports an HPC cluster-wide sharing for distributed processing ( thus solving both PSPACE
and PTIME
dimensions of the HPC processing challenge, unless you fortunately hit the filesystem fileSize
ceiling )
ff
which allows
library(ff)
pt_coords <- ff( vmode = "double", dim = c(16809, 3), initdata = 0 )
pt_dists <- ff( vmode = "double", dim = c(16809, 16809), initdata = -1 )
and work with it in as simple as in matrix-alike [row,column]
mode to fill in the points and process their pair-wise distances et al,
?ffsave
for further details on saving your resulting distances data
and last, but not least
Parallel? No.
Distributed?
Yes, might help with PTIME
:
As noted with filebacked.big.matrix
there are chances to segment the computational PSPACE
into smaller segments for distributed processing and reduction of the PTIME
, but the concept is in principle just a concurrent (re)-use of available resouces, not the [ PARALLEL ] system-behaviour ( while it is necessary to admit, that lot of marketing ( the bad news is that even the technology marketing has joined this unfair and knowingly incorrect practice ) texts mis-uses the word parallel / parallelism in places, where a just concurrent system-behaviour is observed ( there are not many real, true-PARALLEL, systems ) ).
Conclusion:
Big matrices are doable in R
well beyond the InRAM
limits, select the tools most suitable for your problem-domain and harness all the HPC-resources you may.
Error: cannot allocate vector of size 1.1 Gb
is solved.
There is nothing but resources, that imposts limits and delays on our computing-ready tasks, so do not hesitate to make your move while computing resources are still available for your Project, otherwise you will find yourself, with all the re-engineered software ready, but waiting in a queue for the computing resources.