Create a new column in tibble to summarise largest row of selected elements

Question

I've a dataframe football_supporters that looks like this:

Mun_Code   Code   Municipality   Club       Coca_Cola    Fanta    Sprite
   3        11        NYC       Sharks        152          92       148
   3        12        NYC       Wolfs          93          72        54
   3        13        NYC       Dogs          143          67        38
   3        14        NYC       Cats           32          99       156
#With 200 more rows

I would like to create a new column through the mutate() function and summarise what each club's favourite Drink is like down below:

Acode     Code     Clubname    Drink
 11       0111      Sharks     Coca_cola
 12       0112      Wolfs      Coca_cola
 13       0113      Dogs       Coca_cola
 14       0114      Cats       Sprite
#With 200 more rows

I've tried

colMax <- function(football_supporters) sapply(football_supporters, max, na.rm = TRUE)

and it doesn't work.

[For each row return the column name of the largest value](https://stackoverflow.com/questions/17735859/for-each-row-return-the-column-name-of-the-largest-value) — Henrik, Jul 23 '22 at 09:41

Maël · Answer 1 · 2022-07-23T09:05:24.510

2

You can select the indices of the desired columns, and then use max.col, to get the highest column.

Note that you can use .keep = "unused" to remove all the columns that were used in the mutate call.

library(dplyr)

which_drink <- which(colnames(football_supporters) %in% c("Coca_Cola", "Fanta", "Sprite"))

football_supporters %>% 
  mutate(Drink = colnames(.[which_drink])[max.col(across(which_drink))], 
         .keep = "unused")

  Mun_Code Code Municipality   Club     Drink
1        3   11          NYC Sharks Coca_Cola
2        3   12          NYC  Wolfs Coca_Cola
3        3   13          NYC   Dogs Coca_Cola
4        3   14          NYC   Cats    Sprite

edited Jul 23 '22 at 09:05

answered Jul 23 '22 at 08:58

Maël

45,206
3
29
67

1

We could even skip the `which_drink` and do: `mutate(Drink = names(across(Coca_Cola:Sprite))[max.col(across(Coca_Cola:Sprite))], .keep = "unused")` or define `which_drink <- c("Coca_Cola", "Fanta", "Sprite")` and use it inside `across` as `all_of(which_drink)`. – harre Jul 23 '22 at 17:42

TarJae · Answer 2 · 2022-07-23T12:15:23.047

1

Updated code thanks to @Darren Tsai:

We could do it this way:

get long format
arrange and finally select the first:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(c(Coca_Cola, Fanta, Sprite)) %>% 
  group_by(Mun_Code, Code, Club) %>% 
  arrange(-value, .by_group = TRUE) %>% 
  slice(1) %>% 
  select(-c(value, Municipality))

   Mun_Code  Code Club   name     
     <int> <int> <chr>  <chr>    
1        3    11 Sharks Coca_Cola
2        3    12 Wolfs  Coca_Cola
3        3    13 Dogs   Coca_Cola
4        3    14 Cats   Sprite

edited Jul 23 '22 at 12:15

answered Jul 23 '22 at 08:59

TarJae

72,363
6
19
66

1

What's the purpose of `summarise(sum = sum(value))`? I can get the same output without it: `... %>% group_by(Mun_Code, Code, Club) %>% arrange(-value, .by_group = TRUE) %>% slice(1) %>% ...` – Darren Tsai Jul 23 '22 at 10:42
You are right. I thought to have more than 1 of each drink in the group. But now I see that there is always only one drink. Therefore we can omit summarise... – TarJae Jul 23 '22 at 12:11

score 0 · Answer 3 · answered Jul 23 '22 at 08:56

You could gather the drinks columns to one column and slice the max value per group like this:

football_supporters <- read.table(text = "Mun_Code   Code   Municipality   Club       Coca_Cola    Fanta    Sprite
   3        11        NYC       Sharks        152          92       148
   3        12        NYC       Wolfs          93          72        54
   3        13        NYC       Dogs          143          67        38
   3        14        NYC       Cats          32           99       156", header = TRUE)

library(dplyr)
library(tidyr)
football_supporters %>%
  gather(Drink, cnt, Coca_Cola:Sprite) %>% 
  group_by(Club) %>% 
  slice(which.max(cnt)) %>%
  select(-cnt)
#> # A tibble: 4 × 5
#> # Groups:   Club [4]
#>   Mun_Code  Code Municipality Club   Drink    
#>      <int> <int> <chr>        <chr>  <chr>    
#> 1        3    14 NYC          Cats   Sprite   
#> 2        3    13 NYC          Dogs   Coca_Cola
#> 3        3    11 NYC          Sharks Coca_Cola
#> 4        3    12 NYC          Wolfs  Coca_Cola

^{Created on 2022-07-23 by the reprex package (v2.0.1)}

3

see here . Just for info :-) – TarJae Jul 23 '22 at 09:03

score 0 · Answer 4 · answered Jul 23 '22 at 17:32

We could also achieve the same end by using purrr's pmap:

library(dplyr)
library(purrr)

football_supporters |>
  mutate(Drink = pmap(across(Coca_Cola:Sprite),
                      ~ c(...)[c(...) |> which.max())] |> names()
                      ),
         .keep = "unused") #|>
  #select(-Municipality)

Output:

  Mun_Code Code Municipality   Club     Drink
1        3   11          NYC Sharks Coca_Cola
2        3   12          NYC  Wolfs Coca_Cola
3        3   13          NYC   Dogs Coca_Cola
4        3   14          NYC   Cats    Sprite

Create a new column in tibble to summarise largest row of selected elements

4 Answers4