Determine the column with maximum value for each data frame row, accounting for ties

Question

I have a data.frame with several columns, where the values are integers. For example:

set.seed(1)

df <- data.frame(s1 = as.integer(runif(10,0,10)),
                 s2 = as.integer(runif(10,0,10)),
                 s3 = as.integer(runif(10,0,10)))

My question is how to efficiently add a column to this data.frame that will label the column that has the maximum value for each row, but if there are ties the label will be NA.

The slow way of doing this:

df$max <- sapply(1:nrow(df), function(r){
  max.idx <- which(df[r,] == max(df[r,]))
  if(length(max.idx) == 1){
    max.label <- colnames(df)[max.idx]
  } else{
    max.label <- NA
  }
  max.label
})

> df
   s1 s2 s3  max
1   2  2  9   s3
2   3  1  2   s1
3   5  6  6 <NA>
4   9  3  1   s1
5   2  7  2   s2
6   8  4  3   s1
7   9  7  0   s1
8   6  9  3   s2
9   6  3  8   s3
10  0  7  3   s2

I'm looking for something faster for a much larger data.frame

googling your verbatim question, top result. https://stackoverflow.com/questions/17735859/for-each-row-return-the-column-name-of-the-largest-value — tjebo, Feb 09 '21 at 18:10

akrun · Accepted Answer · 2021-02-09T18:27:30.080

2

We can use max.col to get the index of the column with max value in each row. Then, to change those with ties for max to NA, an option is to do a comparision with the max value in each row with that of the dataset, convert to logical vector with rowSums and replace those elements in mx with ties to NA

mx <- names(df)[max.col(df, "first")]
df$max <- replace(mx, rowSums(do.call(pmax, df) == df) > 1, NA)

-output

df
#   s1 s2 s3  max
#1   2  2  9   s3
#2   3  1  2   s1
#3   5  6  6 <NA>
#4   9  3  1   s1
#5   2  7  2   s2
#6   8  4  3   s1
#7   9  7  0   s1
#8   6  9  3   s2
#9   6  3  8   s3
#10  0  7  3   s2

edited Feb 09 '21 at 18:27

answered Feb 09 '21 at 18:07

akrun

874,273
37
540
662

Thanks for the answer @akrun. But I see that `max.col` doesn't have an option to return `NA` in case of ties (its documentation says: `ties.method = c("random", "first", "last")`). Any alternative? – dan Feb 09 '21 at 18:12

Determine the column with maximum value for each data frame row, accounting for ties

1 Answers1