1

Let I have a data frame(df1)

df1:

v1    v2    v3    v4
--    --    --    --
4.1   1.2   12    1.4
14    18.4  15.1  6.9

I want to find nth largest value of each row and also column name of that value.

Foe example, let say I want to find second largest value of each row and related column name. So the output(df2) sould be:

df2:

value   col_name
---     --------
4.1     v1
15.1    v3

How can I do that using R? I will be very glad for any help. Thanks a lot.

oercim
  • 1,808
  • 2
  • 17
  • 34

2 Answers2

3

This is rough, but gets the job done:

second_largest <- apply(df, 1, FUN = function(x) tail(sort(x), 2)[1])
cols <- which(df == second_largest, arr.ind = T)[, 2]

df2 <- data.frame(value = second_largest,
                  col_name = colnames(df)[cols])

# df2
#   value col_name
# 1   4.1       v1
# 2  15.1       v3

dplyr and tidyr alternative:

library(dplyr)
library(tidyr)

df %>%
  mutate(row = row_number()) %>%
  gather(col, val, -row) %>%
  group_by(row) %>%
  arrange(val) %>%
  top_n(2) %>%
  do(head(., 1))
JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
1

Similar, but slightly different approach. If your data is large, this might be somewhat faster - if not I'm sure no real difference will be noticeable.

n = 2L
mat = as.matrix(df1)
ind = apply(df1, 1, FUN = function(x) which(rank(-x) == n))
data.frame(value = mat[cbind(1:nrow(mat), ind)], col_name = colnames(mat)[ind])
#   value col_name
# 1   4.1       v1
# 2  15.1       v3
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294