1

I have a data.frame with several columns, where the values are integers. For example:

set.seed(1)

df <- data.frame(s1 = as.integer(runif(10,0,10)),
                 s2 = as.integer(runif(10,0,10)),
                 s3 = as.integer(runif(10,0,10)))

My question is how to efficiently add a column to this data.frame that will label the column that has the maximum value for each row, but if there are ties the label will be NA.

The slow way of doing this:

df$max <- sapply(1:nrow(df), function(r){
  max.idx <- which(df[r,] == max(df[r,]))
  if(length(max.idx) == 1){
    max.label <- colnames(df)[max.idx]
  } else{
    max.label <- NA
  }
  max.label
})

> df
   s1 s2 s3  max
1   2  2  9   s3
2   3  1  2   s1
3   5  6  6 <NA>
4   9  3  1   s1
5   2  7  2   s2
6   8  4  3   s1
7   9  7  0   s1
8   6  9  3   s2
9   6  3  8   s3
10  0  7  3   s2

I'm looking for something faster for a much larger data.frame

dan
  • 6,048
  • 10
  • 57
  • 125
  • googling your verbatim question, top result. https://stackoverflow.com/questions/17735859/for-each-row-return-the-column-name-of-the-largest-value – tjebo Feb 09 '21 at 18:10

1 Answers1

2

We can use max.col to get the index of the column with max value in each row. Then, to change those with ties for max to NA, an option is to do a comparision with the max value in each row with that of the dataset, convert to logical vector with rowSums and replace those elements in mx with ties to NA

mx <- names(df)[max.col(df, "first")]
df$max <- replace(mx, rowSums(do.call(pmax, df) == df) > 1, NA)

-output

df
#   s1 s2 s3  max
#1   2  2  9   s3
#2   3  1  2   s1
#3   5  6  6 <NA>
#4   9  3  1   s1
#5   2  7  2   s2
#6   8  4  3   s1
#7   9  7  0   s1
#8   6  9  3   s2
#9   6  3  8   s3
#10  0  7  3   s2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for the answer @akrun. But I see that `max.col` doesn't have an option to return `NA` in case of ties (its documentation says: `ties.method = c("random", "first", "last")`). Any alternative? – dan Feb 09 '21 at 18:12