2

THis seems fair enough, maybe its a bug or I am missing something very basic. I try to convert Species to binary variable & hence using case when for a simple operation, however receive an error not sure should arise.

 iris %>% 
   dplyr::mutate(Species=as.factor(Species),
     Species=case_when(Species=="setosa"~"virginica",
                       TRUE~Species))


Error: Problem with `mutate()` input `Species`.
x must be a character vector, not a `factor` object.
i Input `Species` is `case_when(Species == "setosa" ~ "virginica", TRUE ~ Species)`.

Details on session info

 sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] conflicted_1.0.4 extrafontdb_1.0  extrafont_0.17   forcats_0.5.0   
 [5] purrr_0.3.4      readr_1.4.0      tidyr_1.1.2      tibble_3.0.4    
 [9] tidyverse_1.3.0  ggplot2_3.3.2    dplyr_1.0.2      stringr_1.4.0   

loaded via a namespace (and not attached):
 [1] qpdf_1.1           xfun_0.19          tidyselect_1.1.0  
 [4] haven_2.3.1        snakecase_0.11.0   colorspace_1.4-1  
 [7] vctrs_0.3.4        generics_0.1.0     usethis_1.6.3     
[10] htmltools_0.5.0    yaml_2.2.1         utf8_1.1.4        
[13] rlang_0.4.8        pillar_1.4.6       glue_1.4.2        
[16] withr_2.3.0        DBI_1.1.0          dbplyr_2.0.0      
[19] modelr_0.1.8       readxl_1.3.1       lifecycle_0.2.0   
[22] munsell_0.5.0      gtable_0.3.0       cellranger_1.1.0  
[25] rvest_0.3.6        memoise_1.1.0      evaluate_0.14     
[28] knitr_1.30         curl_4.3           fansi_0.4.1       
[31] Rttf2pt1_1.3.8     broom_0.7.2        pdftools_2.3.1    
[34] Rcpp_1.0.5         scales_1.1.1       backports_1.2.0   
[37] jsonlite_1.7.1     fs_1.5.0           hms_0.5.3         
[40] askpass_1.1        digest_0.6.27      stringi_1.5.3     
[43] grid_4.0.3         cli_2.1.0          tools_4.0.3       
[46] magrittr_1.5       crayon_1.3.4       pkgconfig_2.0.3   
[49] ellipsis_0.3.1     xml2_1.3.2         reprex_0.3.0      
[52] lubridate_1.7.9    tidytuesdayR_1.0.1 assertthat_0.2.1  
[55] rmarkdown_2.5      httr_1.4.2         rstudioapi_0.12   
[58] R6_2.5.0           compiler_4.0.3    
Vaibhav Singh
  • 1,159
  • 1
  • 10
  • 25

2 Answers2

2

The iris data set already defaults to having the Species column by a factor. You want character type here, so:

iris %>% 
    dplyr::mutate(Species=as.character(Species),
                  Species=case_when(Species=="setosa" ~ "virginica", TRUE ~ Species))
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • but can't I use case_when on factor type ? – Vaibhav Singh Dec 03 '20 at 05:31
  • Perhaps not, if under the hood `case_when` would be using `==` for equality comparisons. Then you would have to do something like compare the levels, not the actual factor variable itself. It's probably best to just use `character` here for simplicity. – Tim Biegeleisen Dec 03 '20 at 05:32
1

Using case_when on factor variables is bit tricky.

case_when is type strict meaning all the values should evaluate to same type. The first value that you have is of type character ("virginica") and the TRUE value is of type factor hence you get a type mismatch error there. Also all the values should have factor with same levels as your original data. So incorporating all these changes you could do :

library(dplyr)

iris %>% 
  mutate(Species=case_when(Species == "setosa" ~ 
                           factor("virginica", levels = unique(.$Species)),
                           TRUE ~ Species))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213