Following on from my previous question, I'm trying to create a function using tidyr::complete
that can fill in a grouped/summarised tibble with missing dates, with NA
for relevant values, as an intermediate step before further calculations.
I've almost got the function working, but am having trouble with passing column names as arguments.
For reference, more info on what the function is trying to do is below. What I have so far is:
complete_dates <- function(data, datevar, grouping_vars) {
calendar <- expand_grid("{{datevar}}" := seq(min(pull(data %>% select({{datevar}}))), # Extract date vector from data
max(pull(data %>% select({{datevar}}))),by="1 day"))
calendar %>%
left_join(data) %>%
ungroup() %>%
complete({{datevar}}, {{grouping_vars}}) %>%
filter(!if_any({{grouping_vars}}, is.na))
}
The problem arises in the line complete({{datevar}}, {{grouping_vars}})
. As the name implies, I want to be able to pass multiple column names to include in the complete
step. (It's called grouping_vars
because it corresponds to the columns used for the original group_by %>% summarise
in the first place.)
But while the syntax above works with a single column name, it doesn't work with a character vector of column names, e.g. c("GroupA", "GroupB")
.
I've read various SO articles about passing column names to R functions but I'm still an R noob and don't fully grasp the dplyr
syntax, even after reading the relevant blog post. Can anyone advise on the syntax I need please?
Info on function in question:
Basically, I'm starting with something like this:
grouped <- data %>% group_by(Date, Group) %>% summarise(mean = mean(Value))
head(grouped)
# A tibble: 6 × 3
# Groups: Date [4]
Date Group mean
<date> <fct> <dbl>
1 2021-02-18 A 37.4
2 2021-02-19 B 25.5
3 2021-02-19 A 26.1
4 2021-02-22 B 34.2
5 2021-02-22 A 26.4
6 2021-02-23 B 34.2
And want to get something like this:
Date Group mean
<date> <fct> <dbl>
1 2021-02-18 B NA
2 2021-02-18 A 37.4
3 2021-02-19 B 25.5
4 2021-02-19 A 26.1
5 2021-02-20 B NA
6 2021-02-20 A NA
7 2021-02-21 B NA
8 2021-02-21 A NA
9 2021-02-22 B 34.2
10 2021-02-22 A 26.4
where the missing dates are now there, with relevant grouping variables, but with values of NA
.
Example data:
grouped <- structure(list(Date = structure(c(18676, 18677, 18677, 18680,
18680, 18681, 18681), class = "Date"), Group = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L), levels = c("B", "A"), class = "factor"),
mean = c(37.43, 25.54, 26.13, 34.1966666666667, 26.4211111111111,
34.216, 22.8064285714286)), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
Date = structure(c(18676, 18677, 18680, 18681), class = "Date"),
.rows = structure(list(1L, 2:3, 4:5, 6:7), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))