30

There are at least two sparse matrix packages for R. I'm looking into these because I'm working with datasets that are too big and sparse to fit in memory with a dense representation. I want basic linear algebra routines, plus the ability to easily write C code to operate on them. Which library is the most mature and best to use?

So far I've found

  • Matrix which has many reverse dependencies, implying it's the most used one.
  • SparseM which doesn't have as many reverse deps.
  • Various graph libraries probably have their own (implicit) versions of this; e.g. igraph and network (the latter is part of statnet). These are too specialized for my needs.

Anyone have experience with this?

From searching around RSeek.org a little bit, the Matrix package seems the most commonly mentioned one. I often think of CRAN Task Views as fairly authoritative, and the Multivariate Task View mentions Matrix and SparseM.

Argalatyr
  • 4,639
  • 3
  • 36
  • 62
Brendan OConnor
  • 9,624
  • 3
  • 27
  • 25
  • 3
    I think there's `spam` too. The help says: `Differences with SparseM/Matrix are: (1) we only support (essentially) one sparse matrix format, (2) based on transparent and simple structure(s), (3) tailored for MCMC calculations within GMRF. (4) S3 and S4 like-"compatible" ... and it is fast.` Reverse depends: CollocInfer, esd4all, fields, latticeDensity, LatticeKrig, pencopula, rworldmap, splm – Ben Bolker Apr 19 '12 at 18:09
  • Voting to close as tool rec. – Ciro Santilli OurBigBook.com Oct 13 '15 at 14:27

3 Answers3

21

Matrix is the most common and has also just been accepted R standard installation (as of 2.9.0), so should be broadly available.

Matrix in base: https://stat.ethz.ch/pipermail/r-announce/2009/000499.html

David Lawrence Miller
  • 1,801
  • 11
  • 12
7

In my experience, Matrix is the best supported and most mature of the packages you mention. Its C architecture should also be fairly well-exposed and relatively straightforward to work with.

AWB
  • 1,443
  • 2
  • 10
  • 8
1

log(x) on a sparse matrix is a bad idea since log(0) isn't defined and most elements of a sparse matrix are zero.

If you would just like to get the log of the non-zero elements, try converting to a triplet sparse representation and taking a log of those values.

  • 1
    oops. i meant log(1+x) actually. i guess this doesn't make any sense. yeah, i do it with the triplet representation, which makes much more sense. – Brendan OConnor Nov 02 '09 at 14:54