0

I want to find the location of the minimum or maximum value of a data frame or a matrix.

For example, let me use the example of a matrix of minimum (and let's not consider the presence of the same values, for now):

B<-matrix(c(1.5,2,3,4,5,5,4,3,2,1,2,4,6,8,10),nrow=3,ncol=5)
B
     [,1] [,2] [,3] [,4] [,5]
[1,]   1.5    4    4    1    6
[2,]    2    5    3    2    8
[3,]    3    5    2    4   10

What I want the output is:

row.number = 1

column.number = 4

I tried which.min or which.max. It only returns the "total" location as if the input is a vector (it will return the single number 4)

Thanks in advance!

1 Answers1

3

While which.min and friends does not support this directly, which(..., arr.ind=TRUE) does:

which(B == min(B), arr.ind=TRUE)
#      row col
# [1,]   1   4

Very important side note: there are two notes when doing this:

  1. This does not report the existence of ties; and

  2. This assumes that equality of floating-point will work, which is prone to Why are these numbers not equal? and R FAQ 7.31. So while this probably works most of the time, it is feasible that it will not always work. In the case when it doesn't work, it will return a 0-row matrix. One mitigating step would be to introduce a tolerance, such as

    which(abs(B - min(B)) < 1e-9, arr.ind=TRUE)
    #      row col
    # [1,]   1   4
    

    where 1e-9 is deliberately small, but "small" is relative to the range of expected values in the matrix.

Faster Alternative

Honestly, which.max gives you enough information, given you know the dimensions of the matrix.

m <- which.min(B)
c( (m-1) %% nrow(B) + 1, (m-1) %/% nrow(B) + 1 )
# [1] 1 4

For background, a matrix in R is just a vector, ordered in columns.

matrix(1:15, nrow=3)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    4    7   10   13
# [2,]    2    5    8   11   14
# [3,]    3    6    9   12   15

So we can use the modulus %% and integer-division (floor) %/% to determine to row and column number, respectively:

(1:15-1) %% 3 + 1
#  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
(1:15-1) %/% 3 + 1
#  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

And it turns out that this last method is much faster (not too surprising, considering the hard part is done in C):

microbenchmark::microbenchmark(
  a = which(B == min(B), arr.ind=TRUE),             # first answer, imperfect
  b = which(abs(B - min(B)) < 1e-9, arr.ind=TRUE),  # second, technically more correct
  c = {                                             # third, still correct, faster
    m <- which.min(B)
    c( (m-1) %% nrow(B) + 1, (m-1) %/% nrow(B) + 1 )
  }, times=10000)
# Unit: microseconds
#  expr min  lq     mean median   uq   max neval
#     a 8.4 9.0 10.27770    9.5 10.4  93.5 10000
#     b 9.0 9.6 10.94061   10.3 11.1 158.4 10000
#     c 3.3 4.0  4.48250    4.2  4.7  38.7 10000
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Awesome! Thank you so much! – Charlotte Deng Oct 25 '19 at 22:16
  • CharlotteDeng, I thought about it a moment longer and added what I think it is a very important note. I believe that many data-manglers do not appreciate the nuances of floating-point equality in programming languages (including but not exclusively R), so I *do* suggest that you take at least 3 minutes to read from those two links. In general R tries to do the "right-enough" thing, but it will fail when you need it the most (or forget about this comment). – r2evans Oct 26 '19 at 02:30
  • 1
    one could let `which.min()` handle the nuance of numerical equality, and use `idx <- which.min(B)` and then recover row and column index `c(row(B)[idx], col(B)[idx])` – Martin Morgan Oct 26 '19 at 10:47
  • @MartinMorgan, I wasn't as certain until I looked at the source, and you are right: since it only does tests of `<` (https://github.com/wch/r-source/blob/e85c4a3b16e17418199348246563746df3c58afe/src/main/summary.c#L1052), it is safe from the woes of floating-point equality. – r2evans Oct 26 '19 at 14:14