1

I have two large matrices P and Q around (10k x 50k dim in both, but to test this yourself a random 10x10 matrix for P and Q is sufficient). I have a list of indices, e.g.

i    j
1    4
1    625
1    9207
2    827
...  ...

etc. This means that I need to find the dot product of column 1 in P and column 4 in Q, then column 1 in P and column 625 in Q and so on. I could easily solve this with a for loop but I know they are not very efficient in R. Anyone got any ideas?

edit: asked for a reproducible example

P <- matrix(c(1,0,1,0,0,1,0,1,0), nrow = 3, ncol = 3) 
Q <- matrix(c(0,0,1,0,1,0,1,0,1), nrow = 3, ncol = 3) 
i <- c(1,1,2) 
j <- c(2,1,3)

gives output (if in dot product form)

1: 0
2: 1
3: 1
Sotos
  • 51,121
  • 6
  • 32
  • 66
digestivee
  • 690
  • 1
  • 8
  • 16
  • 1
    [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Sotos Sep 28 '17 at 12:35
  • here u go: P <- matrix(runif(100), nrow=10,ncol=10) Q <- matrix(runif(100),nrow = 10, ncol=10) i <- c(1,2,4,7) j <- c(5,3,7,2) – digestivee Sep 28 '17 at 12:37
  • `set.seed` for reproducibility, and put this in your question. Not in the comments. – lmo Sep 28 '17 at 12:39
  • @TrotteBoman why don't you go through the 'trouble' of providing a reproducible example, that is, consistent inputs -> expected output. We would appreciate that very much from you! – CPak Sep 28 '17 at 12:39
  • I will do it, one second. – digestivee Sep 28 '17 at 12:41

2 Answers2

3
P <- matrix(1:50, nrow = 5,ncol = 10) 
Q <- matrix(1:50, nrow = 5, ncol = 10) 
i <- c(1,2,4,7) 
j <- c(5,3,7,2)
P
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    6   11   16   21   26   31   36   41    46
# [2,]    2    7   12   17   22   27   32   37   42    47
# [3,]    3    8   13   18   23   28   33   38   43    48
# [4,]    4    9   14   19   24   29   34   39   44    49
# [5,]    5   10   15   20   25   30   35   40   45    50
Q
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    6   11   16   21   26   31   36   41    46
# [2,]    2    7   12   17   22   27   32   37   42    47
# [3,]    3    8   13   18   23   28   33   38   43    48
# [4,]    4    9   14   19   24   29   34   39   44    49
# [5,]    5   10   15   20   25   30   35   40   45    50
P[,i] * Q[, j]
#      [,1] [,2] [,3] [,4]
# [1,]   21   66  496  186
# [2,]   44   84  544  224
# [3,]   69  104  594  264
# [4,]   96  126  646  306
# [5,]  125  150  700  350
minem
  • 3,640
  • 2
  • 15
  • 29
2

Using matrix multiplication, you can do

diag(t(P[, i]) %*% Q[, j])
[1] 0 1 1

Here is second a solution with apply.

apply(cbind(i, j), 1, function(x) t(P[, x[1]]) %*% Q[, x[2]])
[1] 0 1 1

To verify these agree in a second example:

set.seed(1234)
A <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
B <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)

inds <- matrix(sample(10, 10, replace=TRUE), 5)

matrix multiplication

diag(t(A[, inds[,1]]) %*% B[, inds[,2]])
[1] 215 260 306 237 317

and apply

apply(inds, 1, function(x) t(A[, x[1]]) %*% B[, x[2]])
[1] 215 260 306 237 317
lmo
  • 37,904
  • 9
  • 56
  • 69
  • Nice solution! Seems that minems answer is faster for my problem (14s for this solution and 1.6s for minems) so I will go with that for now – digestivee Sep 28 '17 at 12:57
  • Yeah. `apply` can be slow. The matrix multiplication method I give might be a bit faster than `apply` for small matrices, but will get slower as the matrices grow due to calculating a full matrix, where only the diagonal result is needed. – lmo Sep 28 '17 at 13:00
  • 1
    I still appreciate the answer very much, I really am awful with apply but I am trying to learn it better, so seeing it applied in scenarios like this is very helpful. Actually I thought that some solution using apply would be fastest here but minems solution was very simple yet effective – digestivee Sep 28 '17 at 13:03
  • 1
    Actually, I even need to use your solution for the very next thing I am doing in my algorithm! Sometimes things work out great – digestivee Sep 28 '17 at 13:16