36

I feel like there should be an efficient way to mutate new columns with dplyr using case_when and contains, but cannot get it to work.

I understand using case_when within mutate is "somewhat experimental" (as in this post), but would be grateful for any suggestions.

Doesn't work:

library(tidyverse)

set.seed(1234)

x <- c("Black", "Blue", "Green", "Red")

df <- data.frame(a = 1:20, 
                 b = sample(x,20, replace=TRUE))

df <- df %>%
  mutate(group = case_when(.$b(contains("Bl")) ~ "Group1",
                 case_when(.$b(contains("re", ignore.case=TRUE)) ~ "Group2")
  )  
user438383
  • 5,716
  • 8
  • 28
  • 43
Peter MacPherson
  • 683
  • 1
  • 7
  • 17
  • 1
    I believe `contains` is only to be used inside `select`. At least, that's what I gather from the documentation of `?contains`. – Rich Scriven Apr 29 '17 at 13:42
  • 1
    Thanks - yes I thought that might be true, but wasn't sure from the documentation. Seems like might be useful within `mutate` too, although the `grep` solution below is a good alternative. – Peter MacPherson Apr 29 '17 at 13:46

2 Answers2

68

We can use grep

df %>%  
   mutate(group = case_when(grepl("Bl", b) ~ "Group1",
                            grepl("re", b, ignore.case = TRUE) ~"Group2"))
#    a     b  group
#1   1 Black Group1
#2   2 Green Group2
#3   3 Green Group2
#4   4 Green Group2
#5   5   Red Group2
#6   6 Green Group2
#7   7 Black Group1
#8   8 Black Group1
#9   9 Green Group2
#10 10 Green Group2
#11  1 Green Group2
#12  2 Green Group2
#13  3  Blue Group1
#14  4   Red Group2
#15  5  Blue Group1
#16  6   Red Group2
#17  7  Blue Group1
#18  8  Blue Group1
#19  9 Black Group1
#20 10 Black Group1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 4
    Looks like this can also be achieved with `str_detect` now: https://community.rstudio.com/t/help-me-write-this-script-for-case-when-inside-dplyr-mutate-and-ill-acknowledge-by-name-you-in-my-article/6564 – QAsena Oct 26 '20 at 20:56
  • 2
    @QAsena yes, you are right. `str_detect` also works,, but `grep` is a bit more general in that it can work in different regex modes, i.e perl – akrun Oct 26 '20 at 20:59
8

Wanted to add some examples using str_detect with a paste0 function that would also make concatenating common groups a cinch. Say you're working with gapminder or an other country df.

interest <- c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus",
              "Czech Republic", "Denmark", "Estonia", "Finland",
              "France", "Germany", "Greece", "Hungary", "Ireland",
              "Italy", "Latvia", "Lithuania", "Luxembourg","Malta",
              "The Netherlands", "Poland","Portugal", "Romania",
              "Slovakia", "Slovenia","Spain", "Sweden","United Kingdom")
EU <- paste0(countrycode::countryname(
  sourcevar = interest, destination = "iso2c"), 
  sep = "|", collapse = "")

df%<>%mutate(Region=case_when(
  str_detect(Country, "AT|BE|BG|HR|CY|CZ|DK|EE|FI|FR|DE|GR|HU|IE|
           IT|LV|LT|LU|MT|NL|PL|PT|RO|SK|SI|ES|SE|GB|UK|G8")~ "EU",
  TRUE ~ "Not EU")) ```

You'll need to load `library(magittr)` to get `%<>%` the compound pipe to work, it's basically an abbreviation of `df<-funs(df)`
ibm
  • 744
  • 7
  • 14