-1

I have a dataset with a Key and a Date Column.

I want to create a column which counts the number of occurrences by Key in the 30 days before the value in Date column.

I'm having trouble because the condition changes in every line of the dataset.

How can I do this?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
  • Sample data would be much appreciated, RobertoBuzzini. Please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thanks! – r2evans Apr 26 '21 at 15:14

1 Answers1

0

Try this:


# first some data
library(lubridate)
library(data.table)

n <- 1e3

set.seed(100)
dat <- data.frame(
    Key = paste0( "Key-", sample( x=LETTERS[1:10], size=n, replace=TRUE ) ),
    Date = ymd("2018-01-01") + ceiling( runif( n=n, -400, +400 ) )
) %>% arrange( Date )
setDT(dat)

dat[, KeyCount := sum( Key == dat$Key & between( dat$Date, Date - 30, Date + 30 ) ), by=1:nrow(dat) ]

dat

first.data <- dat$Date[1]

dat[ Key == "Key-C" & Date < ymd("2016-11-28")+60, .(.SD, pluss30 = Date > ymd("2016-11-28")+30)  ]

dat produces this:


> dat
        Key       Date KeyCount
   1: Key-C 2016-11-28        6
   2: Key-A 2016-11-29        4
   3: Key-E 2016-11-30        5
   4: Key-D 2016-12-01        7
   5: Key-H 2016-12-02        6
  ---                          
 996: Key-A 2019-02-01        2
 997: Key-I 2019-02-03        4
 998: Key-D 2019-02-04        4
 999: Key-B 2019-02-05        5
1000: Key-J 2019-02-05        4

And inspecting the first row for controll, we see this:


> dat[ Key == first.key & Date < first.date+60, .(.SD, pluss30 = Date > first.date+30)  ]
   .SD.Key   .SD.Date .SD.KeyCount pluss30
1:   Key-C 2016-11-28            6   FALSE
2:   Key-C 2016-12-10            7   FALSE
3:   Key-C 2016-12-18            8   FALSE
4:   Key-C 2016-12-18            8   FALSE
5:   Key-C 2016-12-22            9   FALSE
6:   Key-C 2016-12-23            9   FALSE
7:   Key-C 2017-01-09           11    TRUE
8:   Key-C 2017-01-14           11    TRUE
9:   Key-C 2017-01-18           10    TRUE

The first key count seems to be correct at least (6 rows wihtin the first 30 days in the set, and its number says 6)

Sirius
  • 5,224
  • 2
  • 14
  • 21