R: Conditional evaluation of expressions given as strings as values for a dplyr mutate

Question

I am trying to do a dplyr::mutate() using case_when to select between various formulas that are assembled from pieces that are strings. However I am clearly not converting the string properly into an expression before quoting and subsequently unquoting them. I have tried seven or eight ways of doing this, all unsuccessful.

The reason for assembling the expression from strings is that I have a large number of groups of variables that have names differing only by a suffix, for example, to distinguish variables in nominal or inflation-adjusted dollars. I use case_when because similar variables have different names, and sometimes different aggregation structures, in different years.

This is a much-simplified example:

bus_inc <- function(tb, suffix) {
  bus1     <- quo(paste0("incbus", suffix, " + ",  "incfarm", suffix, collapse = ""))
  bus2     <- quo(paste0("incbus2", suffix, " + ",  "incfarm", suffix, collapse = ""))
  bus3     <- quo(paste0("incbus", suffix, " + ",  "incfarm2", suffix, collapse = ""))
  out      <- mutate(tb, bus = case_when((year < 1968) ~ UQ(bus1),
                                   ((year > 1967) & (year < 1976)) ~ UQ(bus2),
                                   (year > 1975) ~ UQ(bus3)))
  out
}

Data:

incbus_99     <-   1:56
incfarm_99   <-  57:112
incbus2_99   <-  incbus_99 + 0.5
incfarm2_99 <-  incfarm_99 * 10
year <- 1962:2017
test_tb <- tibble(year, incbus_99, incfarm_99, incbus2_99, incfarm2_99)

my_test <- bus_inc(tb  = test_tb, suffix = "_99")
my_test

The value of bus should be 58 in year 1962 and 70.5 in 1968.

I have found a number of places that suggest parse(text="my_string") as a way of converting a string into an expression, such as this early example (2002) from Martin Maechler. But I have also found a bunch of places that say never to do this, such as Fortune 106 and this recent example from Martin Maechler. I take this forceful repudiation by the formidable Dr. Maechler of a solution he had preciously offered as strong evidence that this is not a good idea, but I do not understand his proposed alternatives, as they seem to evaluate to strings.

You could do e.g. `rlang::parse_expr(stringr::str_glue("incbus{suffix} + incfarm{suffix}"))`. But something else that might make a lot of sense would be to actually get rid of the suffixed columns entirely by reshaping your data into "long form" such that the information carried by the suffix would become a new column instead. — Mikko Marttila, May 12 '18 at 19:28

score 1 · Answer 1 · answered May 12 '18 at 11:39

maybe use a combination of sym() and expr() (you also need to use as.numeric because inconsistent types will throw an error in case_when)...

bus_inc <- function(tb, suffix) {
  bus1 <- expr(!!sym(paste0('incbus', suffix)) + !!sym(paste0('incfarm', suffix)))
  bus2 <- expr(!!sym(paste0('incbus2', suffix)) + !!sym(paste0('incfarm', suffix)))
  bus3 <- expr(!!sym(paste0('incbus', suffix)) + !!sym(paste0('incfarm2', suffix)))
  mutate(tb, bus = case_when(year < 1968 ~ as.numeric(!!bus1),
                             year > 1967 & year < 1976 ~ as.numeric(!!bus2),
                             year > 1975 ~ as.numeric(!!bus3)))
}

library(dplyr)

incbus_99     <-   1:56
incfarm_99   <-  57:112
incbus2_99   <-  incbus_99 + 0.5
incfarm2_99 <-  incfarm_99 * 10
year <- 1962:2017
test_tb <- tibble(year, incbus_99, incfarm_99, incbus2_99, incfarm2_99)

bus_inc(tb  = test_tb, suffix = "_99")

# # A tibble: 56 x 6
#     year incbus_99 incfarm_99 incbus2_99 incfarm2_99   bus
#    <int>     <int>      <int>      <dbl>       <dbl> <dbl>
#  1  1962         1         57        1.5         570  58  
#  2  1963         2         58        2.5         580  60  
#  3  1964         3         59        3.5         590  62  
#  4  1965         4         60        4.5         600  64  
#  5  1966         5         61        5.5         610  66  
#  6  1967         6         62        6.5         620  68  
#  7  1968         7         63        7.5         630  70.5
#  8  1969         8         64        8.5         640  72.5
#  9  1970         9         65        9.5         650  74.5
# 10  1971        10         66       10.5         660  76.5
# # ... with 46 more rows

R: Conditional evaluation of expressions given as strings as values for a dplyr mutate

1 Answers1