2

I have a small DSL allowing to group variables by their names:

group <- function(.data, ...) {
  dots <- quos(...)
  for (i in 1:length(dots)) {
    in_group <- as.character(dots[[i]])[2]
    vec <- trimws(unlist(strsplit(in_group, "[+]")))
    .data <- cbind(.data, TRUE)
      names(.data) <- c(names(.data)[-length(names(.data))], paste0("group_", names(dots[i])))
      .data[, ncol(.data)] <- .data$vars %in% vec
  }
  return(.data)
}

library(magrittr)
# Some data
df <- data.frame(
  vars = c("one", "two", "three", "four"), stringsAsFactors = FALSE
)

# Define a group called abc containing elements two, three and four:
df %>% group(abc = two + three + four)
   vars group_abc
1   one     FALSE
2   two      TRUE
3 three      TRUE
4  four      TRUE

# Define multiple groups
df %>% group(odd = one + three, even = two + four, prime = one + two + three)
   vars group_odd group_even group_prime
1   one      TRUE      FALSE        TRUE
2   two     FALSE       TRUE        TRUE
3 three      TRUE      FALSE        TRUE
4  four     FALSE       TRUE       FALSE

However this does not allow to redefine groups:

df %>% group(abc = two + three + four) %>% group(abc = two)
   vars group_abc group_abc
1   one     FALSE     FALSE
2   two      TRUE      TRUE
3 three      TRUE     FALSE
4  four      TRUE     FALSE

The group abc is defined two times instead of being overwritten.

I tried:

group2 <- function(.data, ...) {
  dots <- quos(...)
  for (i in 1:length(dots)) {
    in_group <- as.character(dots[[i]])[2]
    vec <- trimws(unlist(strsplit(in_group, "[+]")))
    if (any(grepl(names(dots[i]), names(.data)))) {
      .data[, grepl(names(dots[i]), names(.data))] <- .data$vars %in% vec
    } else {
      .data <- cbind(.data, TRUE)
      names(.data) <- c(names(.data)[-length(names(.data))], paste0("group_", names(dots[i])))
      .data[, ncol(.data)] <- .data$vars %in% vec
    }
  }
  return(.data)
}

df %>% group2(abc = two + three + four) %>% group2(abc = two)
   vars group_abc
1   one     FALSE
2   two      TRUE
3 three     FALSE
4  four     FALSE

This sort of works but looks extremely ugly..

So my question is: What's a good way to redefine groups in my group DSL?

Thanks for any hints.


Some more context:

Here's another question of mine concerning this general topic of my DSL

symbolrush
  • 7,123
  • 1
  • 39
  • 67

1 Answers1

1

This is a really fun question. You can use dplyr::mutate to "overwrite" existing variables. We can also streamline your loops through the use of purrr::map. The main idea is to tokenize the provided expressions and construct new ones that look like vars %in% c( "token1", "token2", etc. ). The resulting expressions are then passed to mutate:

library( tidyverse )

group <- function(.data, ...) {
  dots  <- enexprs(...) %>% map(rlang::expr_text)
  nms   <- str_c( "group_", names(dots) )
  elems <- dots %>% str_split("[+]") %>% map(str_trim) %>%
                    map( ~expr(vars %in% !!.x) ) %>% set_names(nms)
  .data %>% mutate( !!!elems )
}

df %>% group(odd = one + three, even = two + four, prime = one + two + three)
#    vars group_odd group_even group_prime
# 1   one      TRUE      FALSE        TRUE
# 2   two     FALSE       TRUE        TRUE
# 3 three      TRUE      FALSE        TRUE
# 4  four     FALSE       TRUE       FALSE

df %>% group( abc = two + three + four ) %>% group( abc = two )
#    vars group_abc
# 1   one     FALSE
# 2   two      TRUE
# 3 three     FALSE
# 4  four     FALSE
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74