0

I have a matrix filled with somewhat random elements. I need every row sorted in decreasing order, then a function is called on the matrix, and finally the resulting matrix needs to be unsorted to the original order.

This is quickly accomplished vector-wise as shown here, but what's the fastest way to do this to every row in a matrix? Right now I'm doing:

# Example matrix
m <- matrix(runif(100), nrow = 25, ncol = 4)

# Get the initial order by row
om <- t(apply(m, 1, order, decreasing = T))

sm <- m
for (i in seq_len(nrow(m))) {
  sm[i, ] <- sm[i, om[i, ]]
}

# ** Operations performed on sm **

# Then unsort
for (i in seq_len(nrow(m))) {
  sm[i, ] <- sm[i, order(om[i, ])]
}

# sm is now sorted by-row in the same order as m

Is there some way given om in the above to sort and unsort while avoiding the for loop or an apply function (both of which make this operation very slow for big m). Thanks!

Edit: There are pointers here: Fastest way to sort each row of a large matrix in R The operation is done inside a function that is already called using parallel, so this operation must be done using serial code.

  • 1
    Have you seen [Fastest way to sort each row of a large matrix in R](https://stackoverflow.com/questions/9506442/fastest-way-to-sort-each-row-of-a-large-matrix-in-r) ? – Ronak Shah Mar 06 '20 at 10:16
  • @RonakShah thanks, yes I did. But that doesn't discuss the un-sorting of the matrix. I'll also edit my question to include that doing the operation in parallel isn't possible. – Brian Albert Monroe Mar 06 '20 at 10:21

1 Answers1

3

Row-wise sorting seems to be straightforward. To get the original order back (un-sort) we need the row-wise ranks rather than their order. Thereafter, what works for column sorting in @Josh O'Brien's answer we can adapt for rows.

Base R solution:

rr <- t(apply(m, 1, rank))  ## get initial RANKS by row
sm <- t(apply(m, 1, sort))  ## sort m

##  DOING STUFF HERE  ##

sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]  ## un-sort
all(m == sm)  ## check
# [1] TRUE

Seems to work.

In your linked answer, the rowSort function of the Rfast package stands out well in terms of performance, which may cover the sorting issue. Moreover there's also a rowRanks function that will cover our ranking issue. So we can avoid apply.

Let's try it out.

m[1:3, ]
#           [,1]      [,2]      [,3]        [,4]
# [1,] 0.9148060 0.5142118 0.3334272 0.719355838
# [2,] 0.9370754 0.3902035 0.3467482 0.007884739
# [3,] 0.2861395 0.9057381 0.3984854 0.375489965

library(Rfast)
rr <- rowRanks(m)  ## get initial RANKS by row
sm <- rowSort(m)   ## sort m
sm[1:3, ]  # check
#            [,1]      [,2]      [,3]      [,4]
# [1,] 0.36106962 0.4112159 0.6262453 0.6311956
# [2,] 0.01405302 0.2171577 0.5459867 0.6836634
# [3,] 0.07196981 0.2165673 0.5739766 0.6737271

##  DOING STUFF HERE  ##

sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]  ## un-sort
all(sm == m)  ## check
# [1] TRUE

Dito.

Benchmark

m.test <- matrix(runif(4e6), ncol = 4)
dim(m.test)
# [1] 1000000       4

# Unit: milliseconds
#     expr        min       lq       mean     median         uq        max neval cld
#    Rfast   897.6286   910.91   956.6259   924.1914   986.1246   1048.058     3 a  
#    baseR 87931.2824 88004.73 95659.8671 88078.1737 99524.1594 110970.145     3   c
#  forloop 58927.7784 59434.54 60317.3903 59941.2930 61012.1963  62083.100     3  b 

Not so bad!!


Data/Code:

set.seed(42)

m <- matrix(runif(100), nrow = 25, ncol = 4)

## benchmark
m.test <- matrix(runif(4e6), ncol = 4)

microbenchmark::microbenchmark(
  Rfast={
    rr <- rowRanks(m.test)
    sm <- rowSort(m.test)
    sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]},
  baseR={
    rr <- t(apply(m.test, 1, rank))
    sm <- t(apply(m.test, 1, sort))
    sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]
  },
  forloop={
    om <- t(apply(m.test, 1, order, decreasing = T))
    sm <- m.test
    for (i in seq_len(nrow(m.test))) {
      sm[i, ] <- sm[i, om[i, ]]
    }
    for (i in seq_len(nrow(m.test))) {
      sm[i, ] <- sm[i, order(om[i, ])]
    }
  }, times=3L
)
Community
  • 1
  • 1
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • This is helpful, thanks. I didn't know about the Rfast package. I was hoping to do it using only base R, but it looks like the speedup from Rcpp is very large in comparison. – Brian Albert Monroe Mar 06 '20 at 13:20
  • colSort and colRanks uses C++'s sort. Rcpp has a minor role in thise algorithms. Also check the parallel version of these functions. Way faster than sequential version. – Manos Papadakis Mar 08 '20 at 08:16
  • I didn't notice that it was done using C++ sort, that's interesting. Does the C++ sort perform noticeably better than Rcpp's sort? Like I say in my question, however, doing these operations in parallel isn't a possibility. – Brian Albert Monroe Mar 09 '20 at 13:29
  • Yes. C++'s Introsort is the heaviest and fastest sorting algorithm in this planet. At least for now. If you don't mind I ask, why not parallel? – Manos Papadakis Mar 11 '20 at 16:09
  • Thanks for the info Manos, the operation is done inside a function that is already called using parallel. There are no cores to spare for this operation, so making it parallel would just clobber the cache. – Brian Albert Monroe Mar 30 '20 at 11:11