I am trying to get the index of the column that has the highest value among selected columns. When trying with dplyr
, my attempts are not giving me the right result.
library(dplyr);library(magrittr)
DF1 <- data.frame(Factor1 = c(1,2,4),Factor2 = c(3,1,1),Factor3 = c(9,1,0)) %>%
mutate(max_ind = which.max(c(.$Factor1,.$Factor2,.$Factor3))) %>% print
Factor1 Factor2 Factor3 max_ind
1 1 3 9 7
2 2 1 1 7
3 4 1 0 7
Where is the mistake? Why is dplyr
behaving this way. I should probably use rowwise
, but that does not seem to be the best way to go. Any thought of how to do this in base
, tidyverse
or data.table
?
Edit-1 (some other attempt)
With sapply I am getting this:
DF1 <- data.frame(Factor1 = c(1,2,4),Factor2 = c(3,1,1),Factor3 = c(9,1,0)) %>%
+ mutate(max_ind = which.max(c(Factor1,Factor2,Factor3)),
+ max_ind2 = sapply(X = ., function(x) which.max(c(x[Factor1],x[Factor2],x[Factor3])))) %>% print
Factor1 Factor2 Factor3 max_ind max_ind2
1 1 3 9 7 4
2 2 1 1 7 1
3 4 1 0 7 1
But here I see 4 in the first row while it should be 3.
Edit-2
I am also looking for a solution where we can specify the columns to be used for the comparison (which.max
)
Edit-3
All of base
, purrr::map
and dplyr::mutate
examples work.
#R>DF1 <- data.frame(Factor1 = c(1,2,4,1),Factor2 = c(3,1,1,6),Factor3 = c(9,1,0,4))
#R>DF1 %>% mutate(max_ind_purrr = pmap(.l = list(Factor1,Factor2,Factor3),~which.max(c(...)))) %>% print()
Factor1 Factor2 Factor3 max_ind_purrr
1 1 3 9 3
2 2 1 1 1
3 4 1 0 1
4 1 6 4 2
#R>DF1 %>% mutate(max_ind_dplyr=max.col(DF1[,1:3]))
Factor1 Factor2 Factor3 max_ind_dplyr
1 1 3 9 3
2 2 1 1 1
3 4 1 0 1
4 1 6 4 2
#R>DF1 <- transform(DF1,max_ind_base=apply(DF1[, c('Factor1','Factor2','Factor3')],1,which.max))%>% print
Factor1 Factor2 Factor3 max_ind_base
1 1 3 9 3
2 2 1 1 1
3 4 1 0 1
4 1 6 4 2