0

I'm having some trouble using case_when within mutate to specify a specific condition. I'm trying to create a new column called treatment, where, if the name of a country(in column name) begins with a vowel, the treatment column reads "1". If the name of the country does not begin with a vowel, I want the treatment column to read "0". I've tried a few things here, but nothing seems to be working.

mutate("treatment" = 
        case_when
        (str_subset(name, pattern = "^[AEIOU]")) ~"1", 
         str_subset(name, pattern = "[^AEIOU]") ~ "0")

Current error message reads: Error: Column treatment is of unsupported type quoted call.

If anyone can help, I would really appreciate it!

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Can you try removing double quote from "treatment"? – Tung Feb 24 '20 at 03:00
  • Just tried–same error popped up. – rbeginnermark Feb 24 '20 at 03:01
  • Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Feb 24 '20 at 05:40

1 Answers1

1

I've created a little example which I hope helps.

Some things to consider:

  1. The left hand side of the argument in case_when() needs to be a logical statement (i.e. TRUE or FALSE result). The str_subset() function you used returns strings that match your condition, not logical. In my below example I use str_starts() which returns a logical which matches the condition of your input.

  2. NULL values are ignored in case_when(), but you can also specify what to do with them if you prefer. Check out the documentation ?case_when for an example of this.

Good luck and welcome to R!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

# create data frame with countries, include NA for demonstration
df <- tibble(
 country = c("Columbia", "Uruguay", "Argentina", "Brazil", NA)
)

df2 <- 
  df %>% 
  mutate(
    starts_vowel = 
      case_when(
      # left hand side of case_when must be a logical
      str_starts(country, "A|E|I|O|U") ~ 1,
      #Adding negate = TRUE returns non-matching
      str_starts(country, "A|E|I|O|U", negate = TRUE) ~ 0, 
      )
  )

df2
#> # A tibble: 5 x 2
#>   country   starts_vowel
#>   <chr>            <dbl>
#> 1 Columbia             0
#> 2 Uruguay              1
#> 3 Argentina            1
#> 4 Brazil               0
#> 5 <NA>                NA

# Check out the difference between str_subset and #str_starts
str_subset(df$country, "^[A|E|I|O|U]")
#> [1] "Uruguay"   "Argentina"
str_starts(df$country, "A|E|I|O|U")
#> [1] FALSE  TRUE  TRUE FALSE    NA

Created on 2020-02-24 by the reprex package (v0.3.0)