0

Consider the sample dataset provided below;

structure(list(store_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), date_invoiced = structure(c(18231, 
18233, 18235, 18237, 18239, 18241, 18243, 18245, 18247, 18249, 
18251, 18253, 18255, 18257, 18259, 18261, 18231, 18233, 18235, 
18237, 18239, 18241, 18243, 18245, 18247, 18249, 18251, 18253, 
18255, 18257, 18259, 18261), class = "Date"), is_good_sale = c(TRUE, 
TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
TRUE, TRUE)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-32L), .Names = c("store_id", "date_invoiced", "is_good_sale"
))

The column 'is_good_sale' is logical column where if it's 'TRUE' then the store recorded a good sale on the said date, else if it's 'FALSE' the store had a bad sale.

I would like to have a new a new column in the said dataframe, which actually considers last 10 days of data for each store (group_by) and if these last ten rows are all 'TRUE' then would return me as TRUE, else FALSE.

For Example, for the date '2019-12-31', I would want the last ten rows, that is data from 2019-12-20 to 2019-12-30, to be considered and based on the condition, provide an outcome for 2019-12-31, in the same dataframe.

JBH
  • 101
  • 6

1 Answers1

1

You can use zoo's rollapplyr :

library(dplyr)

df %>%
  arrange(store_id, date_invoiced) %>%
  group_by(store_id) %>%
  mutate(last_10_good_sale = zoo::rollapplyr(is_good_sale, 10, all, fill = NA))

This will give NA to first 9 rows for each store_id and from 10th row if all the previous 9 value + current value is TRUE it will return TRUE.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213