Create a new variable counting group between dates

Question

I have a dataset with a Key and a Date Column.

I want to create a column which counts the number of occurrences by Key in the 30 days before the value in Date column.

I'm having trouble because the condition changes in every line of the dataset.

How can I do this?

Sample data would be much appreciated, RobertoBuzzini. Please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thanks! — r2evans, Apr 26 '21 at 15:14

score 0 · Answer 1 · answered Apr 26 '21 at 15:55

Try this:


# first some data
library(lubridate)
library(data.table)

n <- 1e3

set.seed(100)
dat <- data.frame(
    Key = paste0( "Key-", sample( x=LETTERS[1:10], size=n, replace=TRUE ) ),
    Date = ymd("2018-01-01") + ceiling( runif( n=n, -400, +400 ) )
) %>% arrange( Date )
setDT(dat)

dat[, KeyCount := sum( Key == dat$Key & between( dat$Date, Date - 30, Date + 30 ) ), by=1:nrow(dat) ]

dat

first.data <- dat$Date[1]

dat[ Key == "Key-C" & Date < ymd("2016-11-28")+60, .(.SD, pluss30 = Date > ymd("2016-11-28")+30)  ]

dat produces this:


> dat
        Key       Date KeyCount
   1: Key-C 2016-11-28        6
   2: Key-A 2016-11-29        4
   3: Key-E 2016-11-30        5
   4: Key-D 2016-12-01        7
   5: Key-H 2016-12-02        6
  ---                          
 996: Key-A 2019-02-01        2
 997: Key-I 2019-02-03        4
 998: Key-D 2019-02-04        4
 999: Key-B 2019-02-05        5
1000: Key-J 2019-02-05        4

And inspecting the first row for controll, we see this:


> dat[ Key == first.key & Date < first.date+60, .(.SD, pluss30 = Date > first.date+30)  ]
   .SD.Key   .SD.Date .SD.KeyCount pluss30
1:   Key-C 2016-11-28            6   FALSE
2:   Key-C 2016-12-10            7   FALSE
3:   Key-C 2016-12-18            8   FALSE
4:   Key-C 2016-12-18            8   FALSE
5:   Key-C 2016-12-22            9   FALSE
6:   Key-C 2016-12-23            9   FALSE
7:   Key-C 2017-01-09           11    TRUE
8:   Key-C 2017-01-14           11    TRUE
9:   Key-C 2017-01-18           10    TRUE

The first key count seems to be correct at least (6 rows wihtin the first 30 days in the set, and its number says 6)

Create a new variable counting group between dates

1 Answers1