13

I have a matrix mat and a vector v. I would like to multiply first column of matrix matby first element of vector v and multiply the second column of matrix mat by second element of vector v. I can do it as shown. How can I do this faster in R since we get a big matrix?

    mat = matrix(rnorm(1500000), ncol= 100)
    v= rnorm(100)
    > system.time( mat %*% diag(v))
      user  system elapsed 
      0.02    0.00    0.02 
IRTFM
  • 258,963
  • 21
  • 364
  • 487
rose
  • 1,971
  • 7
  • 26
  • 32
  • That must be a mighty big matrix you have given the `0.02` elapsed time for 1.5M values. – thelatemail Aug 21 '13 at 04:58
  • 2
    Oddly, this is basically a duplicate of your older question [here](http://stackoverflow.com/q/17080099/489704). `:o` – jbaums Jun 18 '14 at 05:33

3 Answers3

14

Recycling can make it faster but you recycle within columns, not across, so just transpose and transpose back.

t( t(mat) * v )

This should be faster than sweep or %*%.

microbenchmark(mat %*% diag(v),sweep(mat, 2, v, FUN = "*"), t(t(mat)*v))
Unit: milliseconds
            expr       min        lq    median        uq      max neval
             %*% 150.47301 152.16306 153.17379 161.75416 281.3315   100
           sweep  35.94029  42.67210  45.53666  48.07468 168.3728   100
   t(t(mat) * v)  16.50813  23.41549  26.31602  29.44008 160.1651   100
John
  • 23,360
  • 7
  • 57
  • 83
11

A bit late to the game, but did someone say fastest?! This could be another good use of Rcpp. This function (called mmult) will by default multiply each column of a matrix by each successive element of a vector, but has the option to do this by column, by setting byrow = FALSE. It also checks that m and v are of an appropriate size given the byrow option. Anyway, it's fast (around 10-12 times quicker than the best native R answer)...

EDIT

@chris provided this great answer to another question I posed trying to get this to work with RcppArmadillo. However it seems that the Rcpp-only function I posted here is still around 8 times quicker than that, and around 70 times quicker than the OP method. Click the link for the code for @chris' function - it's beautifully simple.

I'll put the benchmarking at the top..

require( microbenchmark )
m <- microbenchmark( mat %*% diag(v) , mmult( mat , v ) , sweep(mat, 2, v, FUN = "*") , chris( mat , v ) , t( t(mat) * v ) , times = 100L )
print( m , "relative" , order = "median" , digits = 3 )
Unit: relative
                        expr   min    lq median    uq   max neval
               mmult(mat, v)  1.00  1.00   1.00  1.00  1.00   100
               chris(mat, v) 10.74  9.31   8.15  7.27 10.44   100
               t(t(mat) * v)  9.65  8.75   8.30 15.33  9.52   100
 sweep(mat, 2, v, FUN = "*") 20.51 18.35  22.18 21.39 16.94   100
             mat %*% diag(v) 80.44 70.11  73.12 70.68 54.96   100

Browse on to see how mmult works and returns the same result as OP...

require( Rcpp )

#  Source code for our function
func <- 'NumericMatrix mmult( NumericMatrix m , NumericVector v , bool byrow = true ){
  if( byrow );
    if( ! m.nrow() == v.size() ) stop("Non-conformable arrays") ;
  if( ! byrow );
    if( ! m.ncol() == v.size() ) stop("Non-conformable arrays") ;

  NumericMatrix out(m) ;

  if( byrow ){
    for (int j = 0; j < m.ncol(); j++) {
      for (int i = 0; i < m.nrow(); i++) {
        out(i,j) = m(i,j) * v[j];
      }
    }
  }
  if( ! byrow ){
    for (int i = 0; i < m.nrow(); i++) {
      for (int j = 0; j < m.ncol(); j++) {
        out(i,j) = m(i,j) * v[i];
      }
    }
  }
  return out ;
}'

#  Make it available
cppFunction( func )

#  Use it
res1 <- mmult( m , v )

#  OP function
res2 <- mat %*% diag(v)

#  Same result?
identical( res1 , res2 ) # Yes!!
[1] TRUE
Community
  • 1
  • 1
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • Is the intended behaviour of `mmult` to modify the original `m`? e.g., take a look at: `m <- m2 <- matrix(runif(9), nc=3); v <- 1:3; z <- mmult(m, v); identical(m, z); identical(m, m2)`. With double matrices, the original matrix is overwritten by the function's output! Some kind of pointer sorcery? – jbaums Jun 15 '14 at 17:02
  • @jbaums let me get back to you... Not sure at the mo. – Simon O'Hanlon Jun 16 '14 at 01:02
  • I don't really understand c++ very well, but if you use `NumericMatrix out(m.nrow(), m.ncol());`, the problem doesn't occur, but it slows things down a bit (approx 6 times as long for `mat` and `vec` given in the question). – jbaums Jun 16 '14 at 01:59
4

sweep seems to run a bit faster on my machine

sweep(mat, 2, v, FUN = "*")

Some benchmarks:

> microbenchmark(mat %*% diag(v),sweep(mat, 2, v, FUN = "*"))

Unit: milliseconds
  expr       min        lq   median        uq      max neval
   %*% 214.66700 226.95551 231.2366 255.78493 349.1911   100
 sweep  42.42987  44.72254  62.9990  70.87403 127.2869   100
thelatemail
  • 91,185
  • 12
  • 128
  • 188
Dason
  • 60,663
  • 9
  • 131
  • 148
  • Why I use your command, it is slower than my code? system.time(sweep(mat, 2, v, FUN = "*")) user system elapsed 0.06 0.02 0.08 – rose Aug 21 '13 at 05:02
  • 2
    @rose For me ```sweep``` is faster in R under OSX but slower under ubuntu. In general, I think you are going to struggle to improve much on ```%*%``` because it is implemented directly in C (line 610 of src/main/array.c). I suppose you could remove some of the sanity checking from that function but the gains are likely to be very small. – orizon Aug 21 '13 at 05:38
  • @orizon Interesting - The results are the same for me (faster on OSX but slower on Ubuntu/Mint). I was actually surprised when sweep was faster when using my wife's Mac. – Dason Aug 21 '13 at 13:27