I use a dynamic variable (eg. ID
) as a way to reference a column name that will change depending on which gene I am processing at the time. I then use case_when
within mutate
to create a new column that will have values that depend on the dynamic column.
I thought that !!
(bang-bang) was what I needed to force eval of the content of the variable; however, I did not get the expected output in my new column. Only the !!as.name
gave me the output I was expecting, and I do not fully understand why. Could someone explain why in this case using only !!
isn't appropriate and what is happening in !!as.name
?
Here is a simple reproducible example that I made up to demo what I am experiencing:
library(tidyverse)
ID <- "birth_year"
# Correct output
test <- starwars %>%
mutate(FootballLeague = case_when(
!!as.name(ID) < 10 ~ "U10",
!!as.name(ID) >= 10 & !!as.name(ID) < 50 ~ "U50",
!!as.name(ID) >= 50 & !!as.name(ID) < 100 ~ "U100",
!!as.name(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
# Incorrect output
test2 <- starwars %>%
mutate(FootballLeague = case_when(
!!(ID) < 10 ~ "U10",
!!(ID) >= 10 & !!(ID) < 50 ~ "U50",
!!(ID) >= 50 & !!(ID) < 100 ~ "U100",
!!(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
# Incorrect output
test3 <- starwars %>%
mutate(FootballLeague = case_when(
as.name(ID) < 10 ~ "U10",
as.name(ID) >= 10 & as.name(ID) < 50 ~ "U50",
as.name(ID) >= 50 & as.name(ID) < 100 ~ "U100",
as.name(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
identical(test, test2)
# FALSE
identical(test2, test3)
# TRUE
sessionInfo()
#R version 4.0.2 (2020-06-22)
#Platform: x86_64-centos7-linux-gnu (64-bit)
#Running under: CentOS Linux 7 (Core)
# tidyverse_1.3.0
# dplyr_1.0.2
Cheers!