7

I have the upper triangular part of matrix in R (without diagonal) and want to generate a symmetric matrix from the upper triangular part (with 1 on the diagonal but that can be adjusted later). I usually do that like this:

res.upper <- rnorm(4950)
res <- matrix(0, 100, 100)
res[upper.tri(res)] <- res.upper
rm(res.upper)
diag(res) <- 1
res[lower.tri(res)]  <- t(res)[lower.tri(res)]

This works fine but now I want to work with very large matrices. Thus, I would want to avoid having to store res.upper and res (filled with 0) at the same time. Is there any way I can directly convert res.upper to a symmetric matrix without having to initialize the matrix res first?

Lila
  • 155
  • 2
  • 9
  • I could write the compiled code, I also do that sometimes to speed up my functions. However, I do not really understand how that avoids using extra memory. In the C/C++ code, I would also first initialize an object like res above. Wouldn't that also use extra memory? Or is that not a problem when using C/C++ because memory allocation is more "intelligent" in these languages? That might be a stupid question but I am a statistician and not a computer scientist, so I really don't know how memory allocation works internally. – Lila Jun 03 '16 at 12:10
  • You don't have to write the function for me, that's not a problem. I am familiar with compilation and the inline package. I just don't have enough background to understand how that solves my memory problem. But if you assure me that it does I'll write the function (and I'll accept your answer if you give it as an answer instead of a comment). – Lila Jun 03 '16 at 12:13
  • Have you tried using `big.matrix` from the `bigmemory` package? Could be a way around your memory limitations – konvas Jun 03 '16 at 12:22
  • The final matrix anyway has to be converted to class kernelMatrix from the library kernlab. I don't know whether this will work with big.matrix but I will definitely have a look at it! Thank you for the hint! I did not know that package, this is the first time I have to work with such big matrices. – Lila Jun 03 '16 at 12:33
  • You might, also, find helpful "Matrix" package -- i.e. `sparseMatrix(i = sequence(1:99), j = rep(2:100, 1:99), x = res.upper, symmetric = TRUE, dims = c(100, 100))` to avoid multiple copying and using (probably) smaller objects – alexis_laz Jun 03 '16 at 13:26
  • @alexis_laz: Thank you for the hint as well! I think I am going with the C solution provided by Zheyuan Li. But I will still have a look at sparseMatrix, cannot hurt to know about it for future problems. – Lila Jun 03 '16 at 22:14

2 Answers2

6

I think there are two issues here.

now I want to work with very large matrices

Then do not use R code to do this job. R will use much more memory than you expect. Try the following code:

res.upper <- rnorm(4950)
res <- matrix(0, 100, 100)
tracemem(res)  ## trace memory copies of `res`
res[upper.tri(res)] <- res.upper
rm(res.upper)
diag(res) <- 1
res[lower.tri(res)]  <- t(res)[lower.tri(res)]

This is what you will get:

> res.upper <- rnorm(4950)  ## allocation of length 4950 vector
> res <- matrix(0, 100, 100)  ## allocation of 100 * 100 matrix
> tracemem(res)
[1] "<0xc9e6c10>"
> res[upper.tri(res)] <- res.upper
tracemem[0xc9e6c10 -> 0xdb7bcf8]: ## allocation of 100 * 100 matrix
> rm(res.upper)
> diag(res) <- 1
tracemem[0xdb7bcf8 -> 0xdace438]: diag<-  ## allocation of 100 * 100 matrix
> res[lower.tri(res)]  <- t(res)[lower.tri(res)]
tracemem[0xdace438 -> 0xdb261d0]: ## allocation of 100 * 100 matrix
tracemem[0xdb261d0 -> 0xccc34d0]: ## allocation of 100 * 100 matrix

In R, you have to use 5 * (100 * 100) + 4950 double words to finish these operations. While in C, you only need at most 4950 + 100 * 100 double words (In fact, 100 * 100 is all that is needed! Will talk about it later). It is difficult to overwrite object directly in R without extra memory assignment.

Is there any way I can directly convert res.upper to a symmetric matrix without having to initialize the matrix res first?

You do have to allocate memory for res because that is what you end up with; but there is no need to allocate memory for res.upper. You can initialize the upper triangular, while filling in the lower triangular at the same time. Consider the following template:

#include <Rmath.h>  // use: double rnorm(double a, double b)
#include <R.h>  // use: getRNGstate() and putRNGstate() for randomness
#include <Rinternals.h>  // SEXP data type

## N is matrix dimension, a length-1 integer vector in R
## this function returns the matrix you want
SEXP foo(SEXP N) {
  int i, j, n = asInteger(N);
  SEXP R_res = PROTECT(allocVector(REALSXP, n * n));  // allocate memory for `R_res`
  double *res = REAL(R_res);
  double tmp;  // a local variable for register reuse
  getRNGstate();
  for (i = 0; i < n; i++) {
    res[i * n + i] = 1.0;  // diagonal is 1, as you want
    for (j = i + 1; j < n; j++) {
      tmp = rnorm(0, 1);  
      res[j * n + i] = tmp; // initialize upper triangular
      res[i * n + j] = tmp;  // fill lower triangular
      }
    }
  putRNGstate();
  UNPROTECT(1);
  return R_res;
  }

The code has not been optimized, as using integer multiplication j * n + i for addressing in the innermost loop will result in performance penalty. But I believe you can move multiplication outside the inner loop, and only leave addition inside.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • Thank you for the explanation! I accepted your answer. I'll follow your suggestion and write that part in C or C++ then. For other readers: Zheyuan Li suggested to use compiled code (via the inline package in R) and I asked him for an explanation on how this solves my memory problem. – Lila Jun 03 '16 at 12:43
  • Just a note -- in your second example (x = 1:4...), 'x' has to be converted, either way, as a whole because of assigning a "double" to an "integer"; i.e. `x[1] = 4L` should not copy – alexis_laz Jun 03 '16 at 13:30
  • I guess something like `y = x` or simulating a function call `(function(val) val)(x)` should mark "x" as to-be-copied in a subsequent call to `"[<-"` – alexis_laz Jun 03 '16 at 13:45
  • Thank you for the template! – Lila Jun 03 '16 at 22:10
2

To get a symmetric matrix from an upper or lower triangular matrix you can add the matrix to its transpose and subtract the diagonal elements. The equation is linked below.

diag(U) is a diagonal matrix with the diagonal elements of U.

Equation 1

ultosymmetric=function(m){
  m = m + t(m) - diag(diag(m))
return (m)}

If you want the diagonal elements to be 1 you can do this.

ultosymmetric_diagonalone=function(m){
  m = m + t(m) - 2*diag(diag(m)) + diag(1,nrow=dim(m)[1])
return (m)}

Equation 2

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • For the sake of simplification, you don't need `m =` and `return(m)` in your functions. R returns the last evaluated expression by default. – Martin Gal Apr 13 '22 at 21:16