1

Following on from my previous question, I'm trying to create a function using tidyr::complete that can fill in a grouped/summarised tibble with missing dates, with NA for relevant values, as an intermediate step before further calculations.

I've almost got the function working, but am having trouble with passing column names as arguments.

For reference, more info on what the function is trying to do is below. What I have so far is:

complete_dates <- function(data, datevar, grouping_vars) {
  calendar <- expand_grid("{{datevar}}" := seq(min(pull(data %>% select({{datevar}}))),  # Extract date vector from data 
                                               max(pull(data %>% select({{datevar}}))),by="1 day"))
  calendar %>% 
    left_join(data) %>% 
    ungroup() %>%
    complete({{datevar}}, {{grouping_vars}}) %>%
    filter(!if_any({{grouping_vars}}, is.na))
}

The problem arises in the line complete({{datevar}}, {{grouping_vars}}). As the name implies, I want to be able to pass multiple column names to include in the complete step. (It's called grouping_vars because it corresponds to the columns used for the original group_by %>% summarise in the first place.)

But while the syntax above works with a single column name, it doesn't work with a character vector of column names, e.g. c("GroupA", "GroupB").

I've read various SO articles about passing column names to R functions but I'm still an R noob and don't fully grasp the dplyr syntax, even after reading the relevant blog post. Can anyone advise on the syntax I need please?


Info on function in question:

Basically, I'm starting with something like this:

grouped <- data %>% group_by(Date, Group) %>% summarise(mean = mean(Value))
head(grouped)
# A tibble: 6 × 3
# Groups:   Date [4]
  Date       Group  mean
  <date>     <fct> <dbl>
1 2021-02-18 A      37.4
2 2021-02-19 B      25.5
3 2021-02-19 A      26.1
4 2021-02-22 B      34.2
5 2021-02-22 A      26.4
6 2021-02-23 B      34.2

And want to get something like this:

   Date       Group  mean
   <date>     <fct> <dbl>
 1 2021-02-18 B      NA  
 2 2021-02-18 A      37.4
 3 2021-02-19 B      25.5
 4 2021-02-19 A      26.1
 5 2021-02-20 B      NA  
 6 2021-02-20 A      NA  
 7 2021-02-21 B      NA  
 8 2021-02-21 A      NA  
 9 2021-02-22 B      34.2
10 2021-02-22 A      26.4

where the missing dates are now there, with relevant grouping variables, but with values of NA. Example data:

grouped <- structure(list(Date = structure(c(18676, 18677, 18677, 18680, 
18680, 18681, 18681), class = "Date"), Group = structure(c(2L, 
1L, 2L, 1L, 2L, 1L, 2L), levels = c("B", "A"), class = "factor"), 
    mean = c(37.43, 25.54, 26.13, 34.1966666666667, 26.4211111111111, 
    34.216, 22.8064285714286)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
    Date = structure(c(18676, 18677, 18680, 18681), class = "Date"), 
    .rows = structure(list(1L, 2:3, 4:5, 6:7), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))
TY Lim
  • 509
  • 1
  • 3
  • 11
  • @akrun added example data. I'm trying to pass multiple colnames to `complete`, and it seems `across` doesn't work there – TY Lim Mar 09 '23 at 18:37

1 Answers1

1

Try

library(dplyr)
library(tidyr)
grouped %>%
   ungroup %>%
   complete(Date = full_seq(Date, period = 1), Group) 

-output

# A tibble: 12 × 3
   Date       Group  mean
   <date>     <fct> <dbl>
 1 2021-02-18 B      NA  
 2 2021-02-18 A      37.4
 3 2021-02-19 B      25.5
 4 2021-02-19 A      26.1
 5 2021-02-20 B      NA  
 6 2021-02-20 A      NA  
 7 2021-02-21 B      NA  
 8 2021-02-21 A      NA  
 9 2021-02-22 B      34.2
10 2021-02-22 A      26.4
11 2021-02-23 B      34.2
12 2021-02-23 A      22.8

If we want to use a function

complete_dates <- function(data, datevar, grouping_vars) {
   data %>%
      ungroup %>%
      complete("{{datevar}}" :=  full_seq({{datevar}}, period = 1), 
 !!! rlang::syms(grouping_vars))
      }

and then call as

complete_dates(grouped, Date, c("GroupA", "GroupB"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • That's definitely more elegant than what I have right now, thanks. But, if I want to wrap it in a function, how do I pass multiple grouping vars (e.g. `GroupA, GroupB`) to `complete` instead of just `Group`? – TY Lim Mar 09 '23 at 18:40
  • @TYLim Suppose you are passing a vector of character names as `grp <- c("GroupA", "GroupB");` Inside the function, you can use `grouped %>% ungroup %>% complete(Date = full_seq(Date, period = 1), !!! rlang::syms(grp))` – akrun Mar 09 '23 at 18:42