2

I've a dataframe football_supporters that looks like this:

Mun_Code   Code   Municipality   Club       Coca_Cola    Fanta    Sprite
   3        11        NYC       Sharks        152          92       148
   3        12        NYC       Wolfs          93          72        54
   3        13        NYC       Dogs          143          67        38
   3        14        NYC       Cats           32          99       156
#With 200 more rows

I would like to create a new column through the mutate() function and summarise what each club's favourite Drink is like down below:

Acode     Code     Clubname    Drink
 11       0111      Sharks     Coca_cola
 12       0112      Wolfs      Coca_cola
 13       0113      Dogs       Coca_cola
 14       0114      Cats       Sprite
#With 200 more rows

I've tried

colMax <- function(football_supporters) sapply(football_supporters, max, na.rm = TRUE)

and it doesn't work.

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Henry Oufh
  • 135
  • 1
  • 1
  • 8
  • 1
    [For each row return the column name of the largest value](https://stackoverflow.com/questions/17735859/for-each-row-return-the-column-name-of-the-largest-value) – Henrik Jul 23 '22 at 09:41

4 Answers4

2

You can select the indices of the desired columns, and then use max.col, to get the highest column.

Note that you can use .keep = "unused" to remove all the columns that were used in the mutate call.

library(dplyr)

which_drink <- which(colnames(football_supporters) %in% c("Coca_Cola", "Fanta", "Sprite"))

football_supporters %>% 
  mutate(Drink = colnames(.[which_drink])[max.col(across(which_drink))], 
         .keep = "unused")

  Mun_Code Code Municipality   Club     Drink
1        3   11          NYC Sharks Coca_Cola
2        3   12          NYC  Wolfs Coca_Cola
3        3   13          NYC   Dogs Coca_Cola
4        3   14          NYC   Cats    Sprite
Maël
  • 45,206
  • 3
  • 29
  • 67
  • 1
    We could even skip the `which_drink` and do: `mutate(Drink = names(across(Coca_Cola:Sprite))[max.col(across(Coca_Cola:Sprite))], .keep = "unused")` or define `which_drink <- c("Coca_Cola", "Fanta", "Sprite")` and use it inside `across` as `all_of(which_drink)`. – harre Jul 23 '22 at 17:42
1

Updated code thanks to @Darren Tsai:

We could do it this way:

  1. get long format

  2. arrange and finally select the first:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(c(Coca_Cola, Fanta, Sprite)) %>% 
  group_by(Mun_Code, Code, Club) %>% 
  arrange(-value, .by_group = TRUE) %>% 
  slice(1) %>% 
  select(-c(value, Municipality))
   Mun_Code  Code Club   name     
     <int> <int> <chr>  <chr>    
1        3    11 Sharks Coca_Cola
2        3    12 Wolfs  Coca_Cola
3        3    13 Dogs   Coca_Cola
4        3    14 Cats   Sprite 
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    What's the purpose of `summarise(sum = sum(value))`? I can get the same output without it: `... %>% group_by(Mun_Code, Code, Club) %>% arrange(-value, .by_group = TRUE) %>% slice(1) %>% ...` – Darren Tsai Jul 23 '22 at 10:42
  • You are right. I thought to have more than 1 of each drink in the group. But now I see that there is always only one drink. Therefore we can omit summarise... – TarJae Jul 23 '22 at 12:11
0

You could gather the drinks columns to one column and slice the max value per group like this:

football_supporters <- read.table(text = "Mun_Code   Code   Municipality   Club       Coca_Cola    Fanta    Sprite
   3        11        NYC       Sharks        152          92       148
   3        12        NYC       Wolfs          93          72        54
   3        13        NYC       Dogs          143          67        38
   3        14        NYC       Cats          32           99       156", header = TRUE)

library(dplyr)
library(tidyr)
football_supporters %>%
  gather(Drink, cnt, Coca_Cola:Sprite) %>% 
  group_by(Club) %>% 
  slice(which.max(cnt)) %>%
  select(-cnt)
#> # A tibble: 4 × 5
#> # Groups:   Club [4]
#>   Mun_Code  Code Municipality Club   Drink    
#>      <int> <int> <chr>        <chr>  <chr>    
#> 1        3    14 NYC          Cats   Sprite   
#> 2        3    13 NYC          Dogs   Coca_Cola
#> 3        3    11 NYC          Sharks Coca_Cola
#> 4        3    12 NYC          Wolfs  Coca_Cola

Created on 2022-07-23 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53
0

We could also achieve the same end by using purrr's pmap:

library(dplyr)
library(purrr)

football_supporters |>
  mutate(Drink = pmap(across(Coca_Cola:Sprite),
                      ~ c(...)[c(...) |> which.max())] |> names()
                      ),
         .keep = "unused") #|>
  #select(-Municipality)

Output:

  Mun_Code Code Municipality   Club     Drink
1        3   11          NYC Sharks Coca_Cola
2        3   12          NYC  Wolfs Coca_Cola
3        3   13          NYC   Dogs Coca_Cola
4        3   14          NYC   Cats    Sprite
harre
  • 7,081
  • 2
  • 16
  • 28