1

In R, I have two matrices, x and y, which both have the same number of columns, say, for example:

x <- matrix(runif(10*20),10,20)
y <- matrix(runif(50*20),50,20)

What is the most efficient way to create a matrix which contains the result of the following comparison. Compare each row in x to each row in y (10x50 row comparisons), return how many numbers in the row of y are smaller than the corresponding number in the row of x. Put the results in a 10x50 result matrix.

The following code works, but it is not efficient:

result <- matrix(NA,10,50)    
for (i in 1:10) {
      for (j in 1:50) {
        result[i,j]<- sum(x[i,]>y[j,])
      }
    }
Simon Wuya
  • 123
  • 6

3 Answers3

3

Indeed your code doesn't run, but I think you mean y <- matrix(runif(50*20),50,20), correct?

In that case you could use the outer function:

outer(rowSums(x), rowSums(y), function(x, y) x > y)

EDIT

I see what you mean, sorry, could have gotten that also with the error. I think this is going to speed up your task considerably:

result2 <- rowSums(x[rep(1:nrow(x), nrow(y)), ] >
     y[rep(1:nrow(y), each = nrow(x)), ]) %>% 
    matrix(nrow = nrow(x))
Edwin
  • 3,184
  • 1
  • 23
  • 25
  • I indeed meant that, sorry, question has been edited and the code now works. Your answer returns booleans though, mine returns an integer with the number of occurences of x>y – Simon Wuya Sep 18 '15 at 07:39
1

I guess y <- matrix(runif(50)) and you can try to use a single loop to speed up the computation:

t(apply(y,1,function(u) rowSums(x<u)))
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
0

This answer is based on @ColonelBeauvel's answer. To speed up the computation you could use one loop instead of two and loop over the smaller matrix (in your example x).

t(apply(x, 1, function(u)colSums(u > t(y))))

Another important note is the fact u < t(y). R compares matrices column-wise that's why it is important to transpose y first.

Complete example with benchmarking:

set.seed(1)
x <- matrix(runif(10*20),10,20)
y <- matrix(runif(50*20),50,20)

f0 <- function(x, y) {
result <- matrix(NA,10,50)
for (i in 1:10) {
      for (j in 1:50) {
        result[i,j]<- sum(x[i,]>y[j,])
      }
    }
result
}

f1 <- function(x, y)t(apply(x,1,function(u)colSums(u>t(y))))

all.equal(f0(x, y), f1(x, y))
# [1] TRUE

benchmark(f0(x, y), f1(x, y), order="relative")
#       test replications elapsed relative user.self sys.self user.child sys.child
# 2 f1(x, y)          100   0.035    1.000     0.032    0.004          0         0
# 1 f0(x, y)          100   0.253    7.229     0.252    0.000          0         0
sgibb
  • 25,396
  • 3
  • 68
  • 74