I am trying to replicate this SO question, but by using the updated syntax which uses the across()
function and gets away from the deprecated summarise_all()
and funs()
.
Starting Data
I have a database extract that one row per event type, like so:
library(tidyverse)
library(zoo)
df_start <- tibble(shipment = c(rep("A",4), rep("B",4)),
stop = rep(c(1,1,2,2), 2),
arrive_pickup = as.POSIXct(c("2021-01-01 07:00:00 UTC",NA, NA, NA,"2021-06-05 12:10:00 UTC", NA, NA, NA)),
depart_pickup = as.POSIXct(c(NA,"2021-01-01 08:40:00 UTC", NA, NA, NA, "2021-06-05 16:58:00 UTC", NA, NA)),
arrive_delivery = as.POSIXct(c(NA, NA, "2021-01-05 10:00:00 UTC",NA, NA, NA,"2021-06-08 10:58:00 UTC", NA)),
depart_delivery = as.POSIXct(c(NA, NA, NA, "2021-01-05 11:30:00 UTC",NA, NA, NA,"2021-06-08 13:50:00 UTC"))
)
> df_start
# A tibble: 8 x 6
shipment stop arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dbl> <dttm> <dttm> <dttm> <dttm>
1 A 1 2021-01-01 07:00:00 NA NA NA
2 A 1 NA 2021-01-01 08:40:00 NA NA
3 A 2 NA NA 2021-01-05 10:00:00 NA
4 A 2 NA NA NA 2021-01-05 11:30:00
5 B 1 2021-06-05 12:10:00 NA NA NA
6 B 1 NA 2021-06-05 16:58:00 NA NA
7 B 2 NA NA 2021-06-08 10:58:00 NA
8 B 2 NA NA NA 2021-06-08 13:50:00
Desired Outcome
... and I want to collapse the number of rows by grouping by either shipments and stops, or even just by shipments (I'm not sure if leaving NA
present in the final dataframe will affect the answer, but I'm seeking to be able to solve it either way).
df_finish1 # One desired outcome
# A tibble: 4 x 6
shipment stop arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dbl> <dttm> <dttm> <dttm> <dttm>
1 A 1 2021-01-01 07:00:00 2021-01-01 08:40:00 NA NA
2 A 2 NA NA 2021-01-05 10:00:00 2021-01-05 11:30:00
3 B 1 2021-06-05 12:10:00 2021-06-05 16:58:00 NA NA
4 B 2 NA NA 2021-06-08 10:58:00 2021-06-08 13:50:00
df_finish2 # Second/alternative desired outcome
# A tibble: 2 x 5
shipment arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dttm> <dttm> <dttm> <dttm>
1 A 2021-01-01 07:00:00 2021-01-01 08:40:00 2021-01-05 10:00:00 2021-01-05 11:30:00
2 B 2021-06-05 12:10:00 2021-06-05 16:58:00 2021-06-08 10:58:00 2021-06-08 13:50:00
What I've researched and tried
Based on this SO question, which does work:
df_1 <- df_start %>%
group_by(shipment, stop) %>% # Two groupings
summarise_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE))) %>%
filter(row_number()==n())
> df_1
# A tibble: 4 x 6
# Groups: shipment, stop [4]
shipment stop arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dbl> <dttm> <dttm> <dttm> <dttm>
1 A 1 2021-01-01 07:00:00 2021-01-01 08:40:00 NA NA
2 A 2 NA NA 2021-01-05 10:00:00 2021-01-05 11:30:00
3 B 1 2021-06-05 12:10:00 2021-06-05 16:58:00 NA NA
4 B 2 NA NA 2021-06-08 10:58:00 2021-06-08 13:50:00
df_2 <- df_start %>%
group_by(shipment) %>% # Single grouping
summarise_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE))) %>%
filter(row_number()==n())
> df_2
# A tibble: 2 x 6
# Groups: shipment [2]
shipment stop arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dbl> <dttm> <dttm> <dttm> <dttm>
1 A 2 2021-01-01 07:00:00 2021-01-01 08:40:00 2021-01-05 10:00:00 2021-01-05 11:30:00
2 B 2 2021-06-05 12:10:00 2021-06-05 16:58:00 2021-06-08 10:58:00 2021-06-08 13:50:00
But what I see is that the summarise_all()
function and the funs()
function are deprecated and not to be used going forward, so I am trying to understand how to use the across()
function properly, but without success:
df_3 <- df_start %>%
group_by(shipment) %>%
summarise(across(everything()), na.locf(., na.rm = FALSE, fromLast = FALSE))
> df_3 <- df_start %>%
+ group_by(shipment) %>%
+ summarise(across(everything()), na.locf(., na.rm = FALSE, fromLast = FALSE))
Error: Problem with `summarise()` input `..2`.
x Input `..2` must be size 4 or 1, not 8.
i An earlier column had size 4.
i Input `..2` is `na.locf(., na.rm = FALSE, fromLast = FALSE)`.
i The error occurred in group 1: shipment = "A".
I've read through the vignette("colwise")
which describe the differences and suggests I would just replace the syntax as shown above, but clearly I'm not getting it right. Help?