The recent package update for sparklyr includes fill
which behaves the same way as the tidyr function of the same name. Moreover, we can look at the SQL and apply the LAST function directly. In the example below I partition by "grp" because for my own work I wanted to group it, just remove the group_by/PARTITION BY to do it without groups.
Create the table
a <- data.frame(vals = c(NA,NA,NA,1,NA,NA,NA,NA,NA,6,NA,NA,NA,10), row_num = 1:14, grp = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2))
b <- copy_to(con,a,"example", overwrite = TRUE)
Apply backfill in SQL
c <- sdf_sql(con,"
SELECT*
, LAST(vals, TRUE) OVER (PARTITION BY grp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_val
FROM example
")
Apply backfull in sparklyr
Admittedly, I've had some trouble using the fill function but this is how it should go.
c <- sdf_sql(con, "SELECT * FROM example")
c <- c %>% mutate(last_val = fill(val,.direction="down")
Alternatively you could just apply the SQL in the mutate which for sure works.
c <- sdf_sql(con, "SELECT * FROM example")
c <- c %>% mutate(last_val = sql("LAST LAST(vals, TRUE) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)"))