Adding a new variable that indicates which existing variable has the maximum value for each row

Question

I have the following data frame:

mydf <- data.frame(label = c("A", "B", "C"),
Var1 = c(0.07635660, 0.22186266, -0.13299621),
Var2 = c(0.25517996, 0.65896751, 0.32703359),
Var3 = c(0.63174426, 0.21518955, 0.47102852))

And for each row, I want to add a new variable that would return the name of the variable for which it has the maximum value:

mydf_end_goal <- data.frame(label = c("A", "B", "C"),
Var1 = c(0.07635660, 0.22186266, -0.13299621),
Var2 = c(0.25517996, 0.65896751, 0.32703359),
Var3 = c(0.63174426, 0.21518955, 0.47102852),
Max = c("Var3", "Var2", "Var3"))

What would be the most efficient way of doing this, preferably using dplyr or purrr? Right now, the best I can come up with is a series of ifelse conditions, which gets really annoying as I have more variables than in my toy example above:

mydf %>% 
 rowwise() %>% 
 mutate(Max = ifelse(Var1 > Var2 & Var1 > Var3, "Var1", 
                     ifelse(Var2 > Var1 & Var2 > Var3, "Var2", "Var3")))

score 4 · Answer 1 · edited Mar 18 '17 at 06:52

4

You can do without any package:

mydf$MaxVar <- colnames(mydf)[apply(mydf[-1], 1, which.max) +1]

mydf
#  label       Var1      Var2      Var3 MaxVar
#1     A  0.0763566 0.2551800 0.6317443   Var3
#2     B  0.2218627 0.6589675 0.2151896   Var2
#3     C -0.1329962 0.3270336 0.4710285   Var3

edited Mar 18 '17 at 06:52

jogo

12,469
11
37
42

answered Mar 18 '17 at 06:44

Marcelo

4,234
1
18
18

score 4 · Accepted Answer · answered Mar 18 '17 at 07:26

4

No loop needed. You can simply use max.col,

mydf$max1 <- names(mydf)[max.col(mydf[-1])+1]

mydf
#  label       Var1      Var2      Var3 max1
#1     A  0.0763566 0.2551800 0.6317443 Var3
#2     B  0.2218627 0.6589675 0.2151896 Var2
#3     C -0.1329962 0.3270336 0.4710285 Var3

answered Mar 18 '17 at 07:26

Sotos

51,121
6
32
66

1

I was not aware of `max.col()` - great concise answer. – Phil Mar 18 '17 at 07:31

Nick Kennedy · Answer 3 · 2017-03-18T08:41:07.070

1

This is not not necessarily the most efficient way, but is one way to do it using dplyr and purrr:

mydf <- mydf %>%
  mutate(Max = select_(., ~-label) %>%
    pmap_chr(function(...)
      names(which.max(c(...))[1])
  ))

Or using max.col:

mydf <- mydf %>%
  mutate(Max = select_(., ~-label) %>%
    {names(.)[max.col(.)]}
  )

edited Mar 18 '17 at 08:41

answered Mar 18 '17 at 08:17

Nick Kennedy

12,510
2
30
52

Adding a new variable that indicates which existing variable has the maximum value for each row

3 Answers3