Add new column with name of max column in data frame

Question

So I've got this example df:

df <- dput(structure(list(arts = structure(c(1L, 1L, 3L, 4L), .Label = c("art1", "art2"), class = "character"), scr1 = c(52L, 58L, 40L, 62L), scr2 = c(25L, 23L, 55L, 26L), scr3 = c(36L, 60L, 19L, 22L)), .Names = c("art_id", "scr1", "scr2", "scr3"), row.names = c(NA, -4L), class = "data.frame"))

> df
  art_id scr1 scr2 scr3
1      1   52   25   36
2      1   58   23   60
3      3   40   55   19
4      4   62   26   22

That I'm using dplyr to summarize by art_id

df %>% 
  group_by(art_id) %>% 
  summarise_each(funs(sum))

  art_id  scr1  scr2  scr3
   <int> <int> <int> <int>
1      1   110    48    96
2      3    40    55    19
3      4    62    26    22

My question: How can I add another column called top_r that contains the column name from the maximum among src1:src3. The resultant df would look like:

  art_id  scr1  scr2  scr3  top_r
   <int> <int> <int> <int>  <char>
1      1   110    48    96   scr1  
2      3    40    55    19   scr2  
3      4    62    26    22   scr1

I am comfortable using dplyr so if there's an answer that uses that library that's wonderful!

@Henrik I don't think this is a duplicate. There are columns that should not be considered (row identifiers) which makes the situation here slightly different and the most accepted answer from the other thread cannot be used in this case. It would even be wrong, because it would also evaluate `art_id` in this example — tjebo, Dec 18 '18 at 10:15

score 5 · Answer 1 · answered Apr 17 '17 at 17:23

5

Its just this simple in base R using max.col:

df$top_r <- names(df)[-1][max.col(df[-1])]

answered Apr 17 '17 at 17:23

989

12,579
5
31
53

1

whoa - that _is_ simple. Thank you! – jmb277 Apr 17 '17 at 17:26

score 2 · Accepted Answer · answered Apr 17 '17 at 17:07

2

This would work:

df %>%
  group_by(art_id) %>%
  summarise_each(funs(sum)) %>%
  mutate(top_r=apply(.[,2:4], 1, function(x) names(x)[which.max(x)]))

# A tibble: 3 × 5
  art_id  scr1  scr2  scr3 top_r
   <int> <int> <int> <int> <chr>
1      1   110    48    96  scr1
2      3    40    55    19  scr2
3      4    62    26    22  scr1

answered Apr 17 '17 at 17:07

Pierre Lapointe

16,017
2
43
56

What if you wanted to add the max value instead of the column name? – zsad512 Nov 22 '17 at 23:55
1

@zsad512 The last two lines would be `summarise_all(funs(sum)) %>% mutate(max=apply(.[,2:4], 1, max))` – Pierre Lapointe Nov 23 '17 at 01:25

score 0 · Answer 3 · answered Apr 17 '17 at 17:09

library(dplyr)
library(tidyr)

df2 <- df %>% 
  group_by(art_id) %>% 
  summarise_each(funs(sum))

df3 <- df2 %>%
  gather(top_r, Value, -art_id) %>%
  arrange(art_id, desc(Value)) %>%
  group_by(art_id) %>%
  slice(1) %>%
  select(-Value)

df_final <- df2 %>%
  left_join(df3, by = "art_id")

df_final
# A tibble: 3 × 5
  art_id  scr1  scr2  scr3 top_r
   <int> <int> <int> <int> <chr>
1      1   110    48    96  scr1
2      3    40    55    19  scr2
3      4    62    26    22  scr1

Add new column with name of max column in data frame

3 Answers3

Linked