5

So I've got this example df:

df <- dput(structure(list(arts = structure(c(1L, 1L, 3L, 4L), .Label = c("art1", "art2"), class = "character"), scr1 = c(52L, 58L, 40L, 62L), scr2 = c(25L, 23L, 55L, 26L), scr3 = c(36L, 60L, 19L, 22L)), .Names = c("art_id", "scr1", "scr2", "scr3"), row.names = c(NA, -4L), class = "data.frame"))

> df
  art_id scr1 scr2 scr3
1      1   52   25   36
2      1   58   23   60
3      3   40   55   19
4      4   62   26   22

That I'm using dplyr to summarize by art_id

df %>% 
  group_by(art_id) %>% 
  summarise_each(funs(sum))

  art_id  scr1  scr2  scr3
   <int> <int> <int> <int>
1      1   110    48    96
2      3    40    55    19
3      4    62    26    22

My question: How can I add another column called top_r that contains the column name from the maximum among src1:src3. The resultant df would look like:

  art_id  scr1  scr2  scr3  top_r
   <int> <int> <int> <int>  <char>
1      1   110    48    96   scr1  
2      3    40    55    19   scr2  
3      4    62    26    22   scr1  

I am comfortable using dplyr so if there's an answer that uses that library that's wonderful!

tjebo
  • 21,977
  • 7
  • 58
  • 94
jmb277
  • 558
  • 4
  • 19
  • 2
    @Henrik I don't think this is a duplicate. There are columns that should not be considered (row identifiers) which makes the situation here slightly different and the most accepted answer from the other thread cannot be used in this case. It would even be wrong, because it would also evaluate `art_id` in this example – tjebo Dec 18 '18 at 10:15

3 Answers3

5

Its just this simple in base R using max.col:

df$top_r <- names(df)[-1][max.col(df[-1])]
989
  • 12,579
  • 5
  • 31
  • 53
2

This would work:

df %>%
  group_by(art_id) %>%
  summarise_each(funs(sum)) %>%
  mutate(top_r=apply(.[,2:4], 1, function(x) names(x)[which.max(x)]))

# A tibble: 3 × 5
  art_id  scr1  scr2  scr3 top_r
   <int> <int> <int> <int> <chr>
1      1   110    48    96  scr1
2      3    40    55    19  scr2
3      4    62    26    22  scr1
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
0
library(dplyr)
library(tidyr)

df2 <- df %>% 
  group_by(art_id) %>% 
  summarise_each(funs(sum))

df3 <- df2 %>%
  gather(top_r, Value, -art_id) %>%
  arrange(art_id, desc(Value)) %>%
  group_by(art_id) %>%
  slice(1) %>%
  select(-Value)

df_final <- df2 %>%
  left_join(df3, by = "art_id")

df_final
# A tibble: 3 × 5
  art_id  scr1  scr2  scr3 top_r
   <int> <int> <int> <int> <chr>
1      1   110    48    96  scr1
2      3    40    55    19  scr2
3      4    62    26    22  scr1
www
  • 38,575
  • 12
  • 48
  • 84