6

Supose we have a matrix like this one:

# Set seed
  set.seed(12345)
# Generate data.frame  
  df <- matrix(sample(1:100,100), nrow = 10)

I would like to get the row and column where the first n highest values are placed.

I know that using which(df == max(df), arr.ind=TRUE) I get what I want but only for the highest value.

Let's suppose we want the location of the 5 highest values in the matrix. Based on the previous answer, I tried which(aux %in% sort(df, decreasing=T)[1:5], arr.ind = TRUE) but it did not work.

I also know that using order(df, decreasing=T) and modulating the results I can get rows and columns I am looking for. Nevertheless, I think it should be a fastest way to get it.

Thank you for your help in advance

R18
  • 1,476
  • 1
  • 8
  • 17
  • 2
    Quick and dirty solution, find everything that's greater or equal to the lowest of your top 5 values: `which(df >= min(sort(df, decreasing=T)[1:5]), arr.ind=TRUE)` – rps1227 May 30 '23 at 06:51

4 Answers4

5

You can use match() and arrayInd():

vals <- head(sort(df, decreasing = TRUE), 5)

cbind(vals, arrayInd(match(vals, df), dim(df), useNames = TRUE))

     vals row col
[1,]  100   8   3
[2,]   99   9   9
[3,]   98   4   8
[4,]   97   7   9
[5,]   96   3   2
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
5

You can use quantile.

which(df >= quantile(df, 1 - 5/length(df)), arr.ind=TRUE)
#     row col
#[1,]   3   2
#[2,]   8   3
#[3,]   4   8
#[4,]   7   9
#[5,]   9   9

In case there are identical values the result need not to be 5.

i <- which(df >= quantile(df, 1 - 5/length(df)))
arrayInd(i[order(df[i], decreasing = TRUE)][1:5], dim(df))
#     [,1] [,2]
#[1,]    8    3
#[2,]    9    9
#[3,]    4    8
#[4,]    7    9
#[5,]    3    2

Maybe using tdigest can speed up the search of the quantile.

Or use the head of order and use %% and %/%.

. <- head(order(df, decreasing = TRUE), 5) - 1
cbind(. %% dim(df)[[1]], . %/% dim(df)[[1]]) + 1
#     [,1] [,2]
#[1,]    8    3
#[2,]    9    9
#[3,]    4    8
#[4,]    7    9
#[5,]    3    2

Or transform the indices with arrayInd.

arrayInd(head(order(df, decreasing = TRUE), 5), dim(df))
#     [,1] [,2]
#[1,]    8    3
#[2,]    9    9
#[3,]    4    8
#[4,]    7    9
#[5,]    3    2

Maybe using some external library can help to speed it up like collapse::radixorderv.

. <- head(collapse::radixorderv(df, decreasing = TRUE), 5) - 1
cbind(. %% dim(df)[[1]], . %/% dim(df)[[1]]) + 1
GKi
  • 37,245
  • 2
  • 26
  • 48
2

Your method works. Its just that should convert to a 2 dimensional array before using which:

which(array(df %in% tail(sort(df), 5), dim(df)), TRUE)

     row col
[1,]   3   2
[2,]   8   3
[3,]   4   8
[4,]   7   9
[5,]   9   9
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • So, what I did returned a vector and I had to transform it into a `martix` or an `array` in order to be able to use `which` – R18 May 30 '23 at 07:17
1

A base R option with expand.grid + order

> expand.grid(lapply(dim(df), seq))[order(-c(df)), ][1:5, ]
   Var1 Var2
28    8    3
89    9    9
74    4    8
87    7    9
13    3    2

Here is a data.table version, where both the values and indices are presented

library(data.table)

setorder(
    data.table(
        val = c(df),
        CJ(
            col = 1:ncol(df),
            row = 1:nrow(df)
        )
    ), -val
)[1:5]

which gives

   val col row
1: 100   3   8
2:  99   9   9
3:  98   8   4
4:  97   9   7
5:  96   2   3
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81