I have a data.frame
with several columns, where the values are integers. For example:
set.seed(1)
df <- data.frame(s1 = as.integer(runif(10,0,10)),
s2 = as.integer(runif(10,0,10)),
s3 = as.integer(runif(10,0,10)))
My question is how to efficiently add a column to this data.frame
that will label the column that has the maximum value for each row, but if there are ties the label will be NA
.
The slow way of doing this:
df$max <- sapply(1:nrow(df), function(r){
max.idx <- which(df[r,] == max(df[r,]))
if(length(max.idx) == 1){
max.label <- colnames(df)[max.idx]
} else{
max.label <- NA
}
max.label
})
> df
s1 s2 s3 max
1 2 2 9 s3
2 3 1 2 s1
3 5 6 6 <NA>
4 9 3 1 s1
5 2 7 2 s2
6 8 4 3 s1
7 9 7 0 s1
8 6 9 3 s2
9 6 3 8 s3
10 0 7 3 s2
I'm looking for something faster for a much larger data.frame