2

I've got a problem with writing around apply(), which works really slow. My assignment is to give the range of the rows of a Matrix without using apply(). I'm trying my best but I still need help...

This is what I got so far:

row.range <- function(X){
  Y <- matrix(0, nrow = nrow(X), ncol = 2)

    for(i in nrow(X)){
    Y[i, 1] <- min(X[i, ])
    Y[i, 2] <- max(X[i, ])
  }
  return(Y)
  print(Y)
}

Where X could be any numeric matrix. Sadly, the output is only 0s, except for the last row where I actually get the right minimum and maximum. Why does this only work for the last row?

For testing I used:

M <- matrix(sample(1:6, size = 100 * 5, replace = TRUE), ncol = 5)
row.range(X)

Any help would be very much appreciated :-)

cholz
  • 83
  • 2
  • 12
  • 1
    Your last question is easy to answer, because the for-loop, "loops" only thru the last value of 'nrow(X)'. The loop should be: 'for i in 1:(nrow(X) ...'. About your performance question: I don't understand what you're trying to do. A for- loop will always be slower then 'apply' as 'apply' works on C speed where your for loop is interpreted. – jboi Jan 05 '16 at 11:17
  • 2
    @jobi you are wrong regarding `apply`. It hides a simple R `for` loop within. This was discussed many times already, see [this discussion, for instance](http://stackoverflow.com/questions/28983292/is-the-apply-family-really-not-vectorized). Though the issue isn't regarding a R loop or compiled C loop (there isn't much of the difference between the two), rather what is being performed within that loop. So `apply` can't just compile the stuff you pass to it. You can use the `compiler` package though in order to try improving your loops performance. – David Arenburg Jan 05 '16 at 11:21

2 Answers2

5

Using a smaller reproducible example

set.seed(123)
M <- matrix(sample(1:6, size = 10 * 5, replace = TRUE), ncol = 5)

You could try the already fully optimized matrixStats::rowRanges function

matrixStats::rowRanges(M)
#       [,1] [,2]
#  [1,]    1    6
#  [2,]    3    6
#  [3,]    3    5
#  [4,]    3    6
#  [5,]    1    6
#  [6,]    1    6
#  [7,]    2    5
#  [8,]    1    6
#  [9,]    2    4
# [10,]    1    6

Or base R vectorized max.col function

cbind(M[cbind(1:nrow(M), max.col(-M))],
      M[cbind(1:nrow(M), max.col(M))])
#       [,1] [,2]
#  [1,]    1    6
#  [2,]    3    6
#  [3,]    3    5
#  [4,]    3    6
#  [5,]    1    6
#  [6,]    1    6
#  [7,]    2    5
#  [8,]    1    6
#  [9,]    2    4
# [10,]    1    6

Another semi-vectorized base R approach it to use pmin/pmax combined with do.call (which offers possible NA handling too), but that will require converting your matrix to a data.frame (not recommended)

DF <- as.data.frame(M)
cbind(do.call(pmin.int, c(na.rm = TRUE, DF)),
      do.call(pmax.int, c(na.rm = TRUE, DF)))
#       [,1] [,2]
#  [1,]    1    6
#  [2,]    3    6
#  [3,]    3    5
#  [4,]    3    6
#  [5,]    1    6
#  [6,]    1    6
#  [7,]    2    5
#  [8,]    1    6
#  [9,]    2    4
# [10,]    1    6

As R being a vectorized language, by row operations will be often slow, thus either try vectorizing or use a package such as Rcpp in order to write compiled C/C++ loops (as was done in the first case)

In most radical cases, you still have hope to greatly optimize your loop using the compiler package


As regarding to your for loop (as was already mentioned by @PereG) you have a syntax error. Instead of for(i in nrow(X)) this should be for(i in 1:nrow(X)). Otherwise you are operating only on the last row.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
2

Comparison/Benchmark answers/ideas and original code for information.

Data matrix generated by M <- matrix(sample(1:6, size = 1e6 * 5, replace = TRUE), ncol = 5)

Code:

row.range <- function(X){
  Y <- matrix(0, nrow = nrow(X), ncol = 2)

  for(i in 1:nrow(X)){
    Y[i, 1] <- min(X[i, ])
    Y[i, 2] <- max(X[i, ])
  }
  return(Y)
}

testapply <- function(x) {
  t(apply(M,1,function(x) { c(min(x),max(x))} ))
}

testcbind <- function(x) {
  Min <- x[cbind(1:nrow(x),max.col(-x))]
  Max <- x[cbind(1:nrow(x),max.col(x))]
  return(cbind(Min,Max))
}

testpmin <- function(x) {
  DF <- as.data.frame(x)
  cbind(do.call(pmin.int, c(na.rm = TRUE, DF)),
        do.call(pmax.int, c(na.rm = TRUE, DF)))
}

Validation:

> head(testpmin(M))
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    1    6
[4,]    3    6
[5,]    1    5
[6,]    1    4
> head(testcbind(M))
     Min Max
[1,]   1   5
[2,]   2   6
[3,]   1   6
[4,]   3   6
[5,]   1   5
[6,]   1   4
> head(testapply(M))
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    1    6
[4,]    3    6
[5,]    1    5
[6,]    1    4
> head(row.range(M))
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    1    6
[4,]    3    6
[5,]    1    5
[6,]    1    4

Benchmark:

> microbenchmark(row.range(M),testapply(M),testcbind(M),testpmin(M),times=10)
Unit: milliseconds
         expr       min        lq      mean    median        uq       max neval
 row.range(M) 3935.1435 4620.8645 4969.8812 5001.3030 5316.3731 5808.4092    10
 testapply(M) 2819.5916 2912.5050 3272.5916 3168.1254 3735.2308 3932.8697    10
 testcbind(M)  248.3587  256.4928  364.5640  282.8879  496.4234  633.2248    10
  testpmin(M)  163.0500  173.0381  203.5254  188.8449  197.8690  385.3048    10
Tensibai
  • 15,557
  • 1
  • 37
  • 57