1

I am attempting to automate many of the tasks done when creating new shiny apps, by writing the needed code to files based on any given dataset. While creating code to be used as a starting point for factor levels, I have become stuck.

The idea is to gather all the unique values appearing in certain columns, and create character vectors from them that can then be altered as needed. The issue is that some of the desired levels span multiple columns, as more than one can be selected. I have managed to write almost working code, but it fails to behave as I expected at one point. The issue results in all but the first string being dropped when mappping a list of vectors. Sorry, it is hard to explain, hopefully you can see what I am doing below - and ask if anything is still not clear please.

### Starting point
data <- tibble(
  a = rep(c("foo", "bar"), 3),
  b = rep(c("baz", "zap"), 3),
  c = rep(c("yes", "no"), 3),
  c_opt_one = rep(c("c_one", ""), 3),
  c_opt_two = rep(c("c_two", ""), 3)
)

levels_meta <- tibble(
  column = c("a", "b", "c", "c_opt", "c_opt"),
  blah = rep(c("blah"), 5) <- multiple other columns, not needed here
)

### Desired output, with problem noted

#>levels
# a_responses <- c(
#   "foo" = "foo",   <- only first entry kept
#   "bar" = "bar"    <- missing
# )
# 
# b_responses <- c(
#   "baz" = "baz",   <- only first entry kept
#   "zap" = "zap"    <- missing
# )
# 
# c_responses <- c(
#   "yes" = "yes",   <- only first entry kept
#   "no" = "no"      <- missing
# )
# 
# c_opt_responses <- c(
#   "c_opt_one" = "c_one",   <- all kept as desired, but only because these
#   "c_opt_two" = "c_two"    <- come from single element vectors before combined
# )


### Processing code
level_names <- levels_meta %>%
  select(column) %>%
  group_by(column) %>%
  add_count()

multi_col_level_names <- level_names %>%
  filter(n > 1) %>%
  pull(column) %>%
  unique()

single_col_level_names <- setdiff(level_names$column, multi_col_level_names)

levels <- lapply(data, unique) %>%
  lapply(setdiff, "")

levels <- map(levels, ~ paste0("  \"", .x, "\"", " = \"", .x, "\""))

# Problem occurs here - only first entry is kept.
# I did try replacing the FALSE arg with levels[[.x]], but same result.
levels <- imap(levels, ~ ifelse(length(.x) == 1, str_replace(.x, "\\w+", .y), .x))

# Rest of code does work, including in case anyone could suggest a more efficient way

multi_col_levels <- map(
  multi_col_level_names,
  function(prefix) levels %>%
    keep(startsWith(names(.), prefix)) %>%
    set_names(str_replace(names(.), names(.), prefix))
) %>% squash()

multi_col_levels <- map(
  set_names(multi_col_level_names),
  ~ unlist(multi_col_levels[names(multi_col_levels) == .], use.names = FALSE)
)

levels <- c(levels[single_col_level_names], multi_col_levels)
levels <- map(levels, ~ paste0(.x, collapse = ",\n"))
levels <- imap(levels, ~ paste0(.y, "_responses <- c(\n", .x, "\n)"))

paste_lvls <- function(out, input) paste(out, input, sep = "\n\n")

levels <- levels %>% reduce(paste_lvls)
msm1089
  • 1,314
  • 1
  • 8
  • 19
  • Can you tell more about what you want to achieve here? `levels <- imap(levels, ~ ifelse(length(.x) == 1, str_replace(.x, "\\w+", .y), .x))` – harre Jun 02 '22 at 10:33
  • 1
    See https://stackoverflow.com/questions/1335830/why-cant-rs-ifelse-statements-return-vectors. – Ritchie Sacramento Jun 02 '22 at 10:53
  • 1
    @harre - For each vector in `levels`, if it it has a single element, replace the first word in that element with the name of that element in `levels`, o/w do nothing. – msm1089 Jun 02 '22 at 11:21
  • @RitchieSacramento - thanks for the redirect. Replacing `ifelse` with `\`if`\` worked! But, it seems a bit of a strange way to do it. Is there any reason not to use `\`if`\`? Also, what is the etiquette to answer this? If you want to add your comment as an answer will select as best answer. O/w, do I just answer with what I found that solved it on the linked to question? – msm1089 Jun 02 '22 at 11:42

2 Answers2

1

My suggestion is to keep it more simple than your imap/ifelse-solution. The problem should be relatively small, so a simple for loop can solve it with less hassle and more clarity (given that the rest of code does what you want):

for (eachlevel in names(levels)) {
  
  if(length(levels[[eachlevel]]) == 1) {
    
    levels[[eachlevel]] <- str_replace(levels[[eachlevel]], "\\w+", eachlevel)
    
  }
  
}
harre
  • 7,081
  • 2
  • 16
  • 28
  • I agree it is clearer to do it that way. I just always try to write things efficiently, not to play code golf, but as a way of learning better the syntax available, and what is possible. – msm1089 Jun 02 '22 at 11:49
  • 1
    My two cents: 1) If efficient means fast, you shouldn't see any decrease in performance for this size of problem. That being said, I understand do understand if you want to make it work with `imap` using if (@Ritchie Sacramento). 2) If efficiency and speed is an issue - and/or you want clearer code - you'd want to get rid of (some of) this string processing. – harre Jun 02 '22 at 11:57
0

I am not sure if the approach below is what you are after:

library(tidyverse)

levels_meta$column %>% 
  unique %>% 
  set_names(., paste0(., "_response")) %>% 
  map(. ,
      ~ {
      dat <- select(data, starts_with(.x) & ends_with(.x))
      if(length(dat) == 0) {
          dat <- select(data, starts_with(.x))
      }
      if (length(dat) == 1) {
        set_names(unique(dat[[.x]]))
      } else if (length(dat) > 1) {
          map(dat, ~ unique(.x[which(.x != "")]))
      } else {
        NULL
      }
    }
  )
#> $a_response
#>   foo   bar 
#> "foo" "bar" 
#> 
#> $b_response
#>   baz   zap 
#> "baz" "zap" 
#> 
#> $c_response
#>   yes    no 
#> "yes"  "no" 
#> 
#> $c_opt_response
#> $c_opt_response$c_opt_one
#> [1] "c_one"
#> 
#> $c_opt_response$c_opt_two
#> [1] "c_two"

Created on 2022-06-02 by the reprex package (v2.0.1)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
  • Not quite @TimTeaFan - the output is going to be saved as text to file. Thats what all the escaped quotes and newlines are for. For 'single' columns (e.g. `a`, `b`, `c`) I want to end up with all unique elements in a vector each. For 'multi' column options (e.g. `c_opt_one`, `c_opt_two` - determined by `levels_meta` having more than one occurrence of the prefix `c_opt`) I want to combine all of these into a single vector, but with the resulting string having their associated column name as the LHS of the `=`. – msm1089 Jun 02 '22 at 11:31