How to assign 0 to the minimum values by row in a matrix (in a fast/efficient way)?

Question

One has a matrix Lambda with p columns and n rows and for each row wants assign all the values to 0 except the value in the first column and the maximum of the values in the other columns (in that sense all the p - 2 minimum values after avoiding the first column).

For the moment I am doing this with a for loop, like follows:

set.seed(60)
(Lambda = matrix(sample.int(30),5))
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   19   20   27   18   15   25
[2,]   16   28    1    4   22    7
[3,]    2   10    8   23    3   12
[4,]    5    6    9   17   11   29
[5,]   26   30   24   13   14   21

m <- ncol(Lambda) - 2
for(ir in seq_len(nrow(Lambda))){
    Lambda[ir, match(tail(sort(abs(Lambda[ir, 2:ncol(Lambda)]), decreasing = TRUE), m), Lambda[ir,])] <- 0
}
Lambda
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   19    0   27    0    0    0
[2,]   16   28    0    0    0    0
[3,]    2    0    0   23    0    0
[4,]    5    0    0    0    0   29
[5,]   26   30    0    0    0    0

Fine, one gets the goal, but if there were many rows it would become a problem. Is there a solution not using a for loop? It could be with lapply but I'm not sure if it would be really efficient. Maybe with data.table after converting the matrix?

Thank you!

Actually my matrix is not so big (any case, less than 1 million rows). It is not a problem for me but I would prefer to solve this question in a better way — iago, Oct 16 '20 at 17:04

score 3 · Answer 1 · answered Oct 16 '20 at 16:40

So concerning

Is there a solution not using a for loop.

For some algorithms, you just have to write a for loop. And that's okay! Swapping a for loop for something like lapply is not really a performance improvement (see https://stackoverflow.com/a/42440872/4917834).

It is possible to speed up your code though:

# your example
set.seed(60)
Lambda = matrix(sample.int(30),5)

original <- function(Lambda) {
  m <- ncol(Lambda) - 2
  for (ir in seq_len(nrow(Lambda))){
    Lambda[ir, match(tail(sort(abs(Lambda[ir, 2:ncol(Lambda)]), decreasing = TRUE), m), Lambda[ir,])] <- 0
  }
  Lambda
}
original(Lambda)

# a faster alternative
proposal <- function(Lambda) {

  nc <- ncol(Lambda)
  for (i in seq_len(nrow(Lambda))) {
    m <- which.max(abs(Lambda[i, -1L]))
    Lambda[i, (2:nc)[-m]] <- 0
  }
  Lambda
}
proposal(Lambda)

Let's benchmark the two approaches:

bch <- bench::mark(
  original(Lambda),
  proposal(Lambda)
)
summary(bch, relative = TRUE)
# A tibble: 2 x 13
  expression              min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc 
  <bch:expr>            <dbl>  <dbl>     <dbl>     <dbl>    <dbl> <int> <dbl> 
1 original(Lambda)       25.7   24.1       1           1     1     1447     4
2 proposal(Lambda)        1      1        23.6         1     2.57  9997     3

So proposal is about 24 times faster than your original solution (median time for original is 313.8µs, for proposal it's 13.1µs). If that's not fast enough it might be worthwhile to look for a package that has implemented this in C or C++. I played around with matrixStats but no luck. Alternatively, you could port this to C++ with Rcpp which should also speed up the code.

Thank you very much! It's already a great improvement. – iago Oct 16 '20 at 17:01 — iago, Oct 16 '20 at 17:01

s_baldur · Answer 2 · 2020-10-17T16:40:36.543

3

Here is one option that ~~is more than 15%~~ seems a bit faster than proposal() on 600k rows:

foo <- function(Lambda) {
  nr <- nrow(Lambda)
  keep <- c(seq_len(nr), apply(Lambda[, -1], 1, which.max)*nr + seq_len(nr))
  replace(Lambda, -keep, 0L)
}

Edit

A vast improvment is replacing the apply() + which.max() combo with max.col() as suggested by markus:

foo2 <- function(Lambda) {
  nr <- nrow(Lambda)
  keep <- c(seq_len(nr), max.col(Lambda[, -1], ties.method = "first")*nr + seq_len(nr))
  replace(Lambda, -keep, 0L)  
}

(Updated) Benchmark:

set.seed(60)
Lambda = matrix(sample.int(36e5), ncol = 6)
bench::mark(
  foo(Lambda),
  proposal(Lambda),
  foo2(Lambda),
  relative = TRUE
)[1:5]

  expression         min median `itr/sec` mem_alloc
  <bch:expr>       <dbl>  <dbl>     <dbl>     <dbl>
1 foo(Lambda)       17.7   12.1      1.09      3.67
2 proposal(Lambda)  19.3   13.1      1         1   
3 foo2(Lambda)       1      1       13.5       3.75

edited Oct 17 '20 at 16:40

answered Oct 16 '20 at 17:33

s_baldur

29,441
4
36
69

2

Your already good approach can be further improved if you use `max.col(Lambda[,-1], ties.method = "first")` instead of `apply(Lambda[, -1], 1, which.max)` – markus Oct 16 '20 at 21:08
@markus wow the difference is night and day. Thank you, I had forgot about `max.col()`. – s_baldur Oct 17 '20 at 15:23
Thank you very much both sindri_baldur and @markus. – iago Oct 19 '20 at 09:23

score 1 · Accepted Answer · answered Oct 17 '20 at 05:38

1

A vectorized version that can think of:

vectorized <- function(Lambda) {
  max <- matrixStats::rowMaxs(Lambda, cols = -1)
  noreplace <- sweep(Lambda, 1, max, "==")
  noreplace[, 1] <- TRUE
  Lambda * noreplace
}

But it is not faster than the for loop by @Vandenman.

answered Oct 17 '20 at 05:38

F. Privé

11,423
2
27
78

Thank you very much! It works great. Actually, benchmarks with the size of the one by sindri_baldur or bigger, tell that your function is almost as faster as `foo2` by him and markus – iago Oct 19 '20 at 09:26

How to assign 0 to the minimum values by row in a matrix (in a fast/efficient way)?

3 Answers3