0

I have a large, sparse matrix saved in a RData file. The script to access this matrix will be kicked off from a console call to RScript. It is both time and resource intensive to load this matrix on every call from the script. Is there a way to hold a matrix in memory so that multiple calls from the console can use the matrix without having to load it as an object every single time?

Unknown Coder
  • 6,625
  • 20
  • 79
  • 129

2 Answers2

1

Try the 'bigmatrix' package. Basically you create a matrix with a call to 'big.matrix()', then obtain a hook to that matrix through a call to 'describe()'. The content of the hook can then be used to attach the already loaded matrix into another process using 'attach.big.matrix()'.

Edit: an example:

Start 2 R sessions, 1 & 2

on Session 1:

require(bigmemory)
system.time(M <- matrix(rnorm(1e8), 1e4)) # ~9"
format(object.size(M), "Mb") # ~762Mb
system.time(M <- as.big.matrix(M)) # ~ 3"

hook = describe(M)
saveRDS(hook, "shared-matrix-hook.rds")
M[1:3,1:3]

on Session 2

require(bigmemory)
system.time(hook <- readRDS("shared-matrix-hook.rds")) # 0.001"

system.time(Mshared <- attach.big.matrix(hook)) # 0.002"

Mshared[1:3,1:3] # shows the same as session 1 did
Mshared[2,2] = 0 # check on session 1 that this change is present there
IBrum
  • 345
  • 1
  • 9
  • Interesting approach. But how does it hold the "state" of the matrix in a quick-to-load manner from one script request to another? – Unknown Coder Apr 04 '17 at 23:49
  • one way for holding the "state" is to save the hook (the result from 'describe()') to a file, that will be read when there is another call. – IBrum Apr 05 '17 at 00:39
0

Should not the problem of sharing large data fundamentally introduce an architecture that can use data sharing like in-memory DB?