Assuming your dataframe is called df
, just subtract the date
from its lag by group:
library(dyplr)
df %>%
mutate(
date = as.Date(date, format = "%d/%m/%Y")
) %>%
group_by(id) %>%
arrange(date, .by_group = TRUE) %>%
mutate(
lag_date = lag(date),
num_days = as.numeric(date - lag_date),
thirty_days = ifelse(num_days > 30, TRUE, FALSE)
) %>%
select(-lag_date)
Output:
# Groups: id [2]
id date `in` num_days thirty_days
<chr> <date> <lgl> <dbl> <lgl>
1 a 2020-09-24 TRUE NA NA
2 a 2020-10-22 FALSE 28 FALSE
3 a 2020-11-04 TRUE 13 FALSE
4 a 2020-12-17 TRUE 43 TRUE
5 a 2020-12-28 FALSE 11 FALSE
6 b 2020-01-01 TRUE NA NA
7 b 2020-01-29 FALSE 28 FALSE
8 b 2020-02-01 TRUE 3 FALSE
9 b 2020-12-31 TRUE 334 TRUE
Also it's not great to have a column called in
, that's a reserved word in R.
Edit: Fixed as realised data was not sorted by date.