Create a variable that group by number of dates in R

Question

I want to create a variable that returns "TRUE" for each observation that it makes every 30 days from the last existing observation that is "TRUE" for each ID.

The result is:

id  date    in
a   24/09/2020  TRUE
a   22/10/2020  FALSE
a   04/11/2020  TRUE
a   17/12/2020  TRUE
a   28/12/2020  FALSE
b   01/01/2020  TRUE
b   29/01/2020  FALSE
b   31/12/2020  TRUE
b   01/02/2020  TRUE

I have tried to answer what I think you're asking but I think this question could be clearer. It would be helpful to include some desired output, and also include your data.frame in a more easily reproducible format using `dput()`. See here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — SamR, Apr 04 '22 at 10:16

Maël · Accepted Answer · 2022-04-04T10:51:58.570

2

Using accumulate:

library(tidyverse)
library(lubridate)

df %>% 
  group_by(id) %>% 
  mutate(date = dmy(date),
         .in = accumulate(abs(diff(date)), .init = 30, .f = ~ ifelse(.x < 30, .x + .y, .y)) >= 30)


  id       date   in.   .in
1  a 2020-09-24  TRUE  TRUE
2  a 2020-10-22 FALSE FALSE
3  a 2020-11-04  TRUE  TRUE
4  a 2020-12-17  TRUE  TRUE
5  a 2020-12-28 FALSE FALSE
6  b 2020-01-01  TRUE  TRUE
7  b 2020-01-29 FALSE FALSE
8  b 2020-12-31  TRUE  TRUE
9  b 2020-02-01  TRUE  TRUE

data

df <- read.table(header = T, text = "id  date    in
a   24/09/2020  TRUE
a   22/10/2020  FALSE
a   04/11/2020  TRUE
a   17/12/2020  TRUE
a   28/12/2020  FALSE
b   01/01/2020  TRUE
b   29/01/2020  FALSE
b   31/12/2020  TRUE
b   01/02/2020  TRUE")

edited Apr 04 '22 at 10:51

answered Apr 04 '22 at 10:16

Maël

45,206
3
29
67

This is not grouped. Also, using `accumulate()` is kind of unnecessary here, because you can achieve the same thing simply by lagging – shs Apr 04 '22 at 10:29
This solution accounts for when there is more than one consecutive FALSE value. I'm not sure how that would be possible without repetitive lagging. It's also shorter. – Maël Apr 04 '22 at 10:55
Sorry, I misunderstood the question on that point. You are correct – shs Apr 04 '22 at 11:00

score 1 · Answer 2 · answered Apr 04 '22 at 10:09

Assuming your dataframe is called df, just subtract the date from its lag by group:

library(dyplr)
df %>%
    mutate(
        date = as.Date(date, format = "%d/%m/%Y")
    ) %>%
    group_by(id) %>%
    arrange(date, .by_group = TRUE) %>%
    mutate(
        lag_date = lag(date),
        num_days = as.numeric(date - lag_date),
        thirty_days = ifelse(num_days > 30, TRUE, FALSE)
    ) %>%
    select(-lag_date)

Output:

# Groups:   id [2]
  id    date       `in`  num_days thirty_days
  <chr> <date>     <lgl>    <dbl> <lgl>
1 a     2020-09-24 TRUE        NA NA
2 a     2020-10-22 FALSE       28 FALSE
3 a     2020-11-04 TRUE        13 FALSE
4 a     2020-12-17 TRUE        43 TRUE
5 a     2020-12-28 FALSE       11 FALSE
6 b     2020-01-01 TRUE        NA NA
7 b     2020-01-29 FALSE       28 FALSE
8 b     2020-02-01 TRUE         3 FALSE
9 b     2020-12-31 TRUE       334 TRUE

Also it's not great to have a column called in, that's a reserved word in R.

Edit: Fixed as realised data was not sorted by date.

Create a variable that group by number of dates in R

2 Answers2