2

I am trying to count the number of occurrences of an event within 1 year of the date of that row. I have came up with the count for number of days since the last event but cannot figure out how to continue as I need to look back 365 days, and not just look forward from the date of last dummy.

I need the counts only when level is not NA, but that is not a big issue.

EDIT:

I have added 14 more rows to show another example where level is not actually NA.

After help from jay.sf, the current result gives:

Row 33 = 1 but would like Row 33 = 0 because there were no occurrences within 1 year before.

Similarly for row 37.

Row 39 = 2 but there was only 1 occurence before, not considering today's.

Therefore I believe I need to change the code such that I only consider counting that row's dummyflag on the next row.

dtIhave2 = data.table(
  id = c(rep(1,17),rep(2,13),rep(3,14)),
  date = c(as.Date("2014-12-05"),
           as.Date("2015-01-23"),
           as.Date("2015-03-06"),
           as.Date("2015-05-15"),
           as.Date("2015-08-06"),
           as.Date("2015-10-29"),
           as.Date("2016-01-21"),
           as.Date("2016-04-06"),
           as.Date("2016-07-11"),
           as.Date("2016-10-03"),
           as.Date("2016-11-11"),
           as.Date("2016-12-07"),
           as.Date("2017-10-25"),
           as.Date("2018-01-09"),
           as.Date("2018-02-12"),
           as.Date("2018-07-04"),
           as.Date("2018-11-30"),
           as.Date("2014-05-14"),
           as.Date("2014-09-03"),
           as.Date("2014-09-04"),
           as.Date("2014-10-15"),
           as.Date("2014-11-08"),
           as.Date("2014-12-05"),
           as.Date("2014-12-18"),
           as.Date("2014-12-20"),
           as.Date("2014-12-23"),
           as.Date("2015-05-15"),
           as.Date("2015-08-19"),
           as.Date("2016-06-23"),
           as.Date("2017-04-21"),
           
           as.Date("2015-01-03"),
           as.Date("2015-02-13"),
           as.Date("2015-06-01"),#
           as.Date("2015-09-05"),
           as.Date("2015-12-01"),
           as.Date("2016-06-10"),
           as.Date("2016-10-16"),#
           as.Date("2016-12-15"),
           as.Date("2017-04-30"),#
           as.Date("2017-06-23"),
           as.Date("2017-10-01"),
           as.Date("2017-12-01"),
           as.Date("2018-03-10"),
           as.Date("2018-06-02")
           
           ),
  
  level = c(rnorm(10,7,1),
            NA,
            rnorm(9,7,1),
            NA,NA,
            7,
            NA,NA,NA,
            rnorm(4,7,1),
            rnorm(14,7,1)), 
  
  dummyflag = c(rep(0 ,10), 
                1,
                rep(0,9),
                1,
                1,
                0,
                1,
                1,
                1,
                rep(0,4),
                rep(0,2),
                1,
                rep(0,3),
                1,
                rep(0,1),
                1,
                rep(0,5)),
  
  dayssincedummy = c(rep(NA,11),
                     26,348,424,458,600,749,
                     rep(NA,4),
                     24,27,40,2,3,143,239,548,850,
                     rep(NA,3),
                     96,
                     183,
                     375,
                     503,
                     60,
                     196,
                     54,
                     154,
                     215,
                     314,
                     398
                     
                     )
)

dtIhave2$within1yr = sapply(seq_len(nrow(dtIhave2)),function(i) dtIhave2[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) & id == id[i] & !is.na(level[i]), sum(dummyflag %in% 1)])

> dtIhave2
    id       date    level dummyflag dayssincedummy within1yr
 1:  1 2014-12-05 7.977480         0             NA         0
 2:  1 2015-01-23 7.589833         0             NA         0
 3:  1 2015-03-06 7.301062         0             NA         0
 4:  1 2015-05-15 6.739734         0             NA         0
 5:  1 2015-08-06 5.682534         0             NA         0
 6:  1 2015-10-29 6.659627         0             NA         0
 7:  1 2016-01-21 7.159197         0             NA         0
 8:  1 2016-04-06 9.957324         0             NA         0
 9:  1 2016-07-11 6.607859         0             NA         0
10:  1 2016-10-03 7.093568         0             NA         0
11:  1 2016-11-11       NA         1             NA         0
12:  1 2016-12-07 5.527618         0             26         1
13:  1 2017-10-25 6.055255         0            348         1
14:  1 2018-01-09 6.031328         0            424         0
15:  1 2018-02-12 5.875067         0            458         0
16:  1 2018-07-04 6.875352         0            600         0
17:  1 2018-11-30 8.439167         0            749         0
18:  2 2014-05-14 7.381595         0             NA         0
19:  2 2014-09-03 7.325306         0             NA         0
20:  2 2014-09-04 8.101320         0             NA         0
21:  2 2014-10-15       NA         1             NA         0
22:  2 2014-11-08       NA         1             24         0
23:  2 2014-12-05 7.000000         0             27         2
24:  2 2014-12-18       NA         1             40         0
25:  2 2014-12-20       NA         1              2         0
26:  2 2014-12-23       NA         1              3         0
27:  2 2015-05-15 7.211657         0            143         5
28:  2 2015-08-19 7.274550         0            239         5
29:  2 2016-06-23 7.216593         0            548         0
30:  2 2017-04-21 6.516086         0            850         0
31:  3 2015-01-03 7.945201         0             NA         0
32:  3 2015-02-13 8.417933         0             NA         0
33:  3 2015-06-01 9.290180         1             NA         1
34:  3 2015-09-05 8.400137         0             96         1
35:  3 2015-12-01 8.115692         0            183         1
36:  3 2016-06-10 7.322929         0            375         0
37:  3 2016-10-16 4.946102         1            503         1
38:  3 2016-12-15 9.435223         0             60         1
39:  3 2017-04-30 6.671779         1            196         2
40:  3 2017-06-23 6.994869         0             54         2
41:  3 2017-10-01 7.540090         0            154         2
42:  3 2017-12-01 7.332589         0            215         1
43:  3 2018-03-10 7.779732         0            314         1
44:  3 2018-06-02 6.068338         0            398         0
    id       date    level dummyflag dayssincedummy within1yr
> dtIwant2
    id       date    level dummyflag dayssincedummy within1yr
 1:  1 2014-12-05 7.977480         0             NA         0
 2:  1 2015-01-23 7.589833         0             NA         0
 3:  1 2015-03-06 7.301062         0             NA         0
 4:  1 2015-05-15 6.739734         0             NA         0
 5:  1 2015-08-06 5.682534         0             NA         0
 6:  1 2015-10-29 6.659627         0             NA         0
 7:  1 2016-01-21 7.159197         0             NA         0
 8:  1 2016-04-06 9.957324         0             NA         0
 9:  1 2016-07-11 6.607859         0             NA         0
10:  1 2016-10-03 7.093568         0             NA         0
11:  1 2016-11-11       NA         1             NA         0
12:  1 2016-12-07 5.527618         0             26         1
13:  1 2017-10-25 6.055255         0            348         1
14:  1 2018-01-09 6.031328         0            424         0
15:  1 2018-02-12 5.875067         0            458         0
16:  1 2018-07-04 6.875352         0            600         0
17:  1 2018-11-30 8.439167         0            749         0
18:  2 2014-05-14 7.381595         0             NA         0
19:  2 2014-09-03 7.325306         0             NA         0
20:  2 2014-09-04 8.101320         0             NA         0
21:  2 2014-10-15       NA         1             NA         0
22:  2 2014-11-08       NA         1             24         0
23:  2 2014-12-05 7.000000         0             27         2
24:  2 2014-12-18       NA         1             40         0
25:  2 2014-12-20       NA         1              2         0
26:  2 2014-12-23       NA         1              3         0
27:  2 2015-05-15 7.211657         0            143         5
28:  2 2015-08-19 7.274550         0            239         5
29:  2 2016-06-23 7.216593         0            548         0
30:  2 2017-04-21 6.516086         0            850         0
31:  3 2015-01-03 7.945201         0             NA         0
32:  3 2015-02-13 8.417933         0             NA         0
33:  3 2015-06-01 9.290180         1             NA         0
34:  3 2015-09-05 8.400137         0             96         1
35:  3 2015-12-01 8.115692         0            183         1
36:  3 2016-06-10 7.322929         0            375         0
37:  3 2016-10-16 4.946102         1            503         0
38:  3 2016-12-15 9.435223         0             60         1
39:  3 2017-04-30 6.671779         1            196         1
40:  3 2017-06-23 6.994869         0             54         2
41:  3 2017-10-01 7.540090         0            154         2
42:  3 2017-12-01 7.332589         0            215         1
43:  3 2018-03-10 7.779732         0            314         1
44:  3 2018-06-02 6.068338         0            398         0
    id       date    level dummyflag dayssincedummy within1yr


typing gibberish as there is too much code in my question: ajksdfcksjadbf jklsdaakjsdhfkjsdhafkajsdfasdf asd gasd fgasdfsadfasdfas dfa dfas dfasd fa

  • 1
    Does this answer your question? [Count observations over rolling 30 day window](https://stackoverflow.com/questions/71159724/count-observations-over-rolling-30-day-window) – zephryl Mar 20 '22 at 13:13
  • 2
    FYI, `c(as.Date(".."), as.Date(".."), as.Date(".."), ...)` is the same as `as.Date(c("..", "..", "..", ...))`, though the latter is written in far fewer characters and is more readable (imo). – r2evans Mar 20 '22 at 16:54

2 Answers2

3

Try seq.Date with '-1 year' and loop over the rows using sapply.

library(data.table)
sapply(seq_len(nrow(dtIhave)), \(i) 
       dtIhave[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) & 
                 id == id[i] & !is.na(level[i]), sum(dummyflag %in% 1)])
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 5 5 0 0

You can also use '-365 days' but '-1 year' also takes into account the leap years.

Edit

For your updated case override !is.na(level[i]) with ummyflag[i] == 0.

sapply(seq_len(nrow(dtIhave2)), \(i) 
       dtIhave2[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) & 
                 id == id[i] & dummyflag[i] == 0, sum(dummyflag %in% 1)])
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 5 5 0 0 0 0 0 1 1 0 0 1 0 2 2 1 1 0
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I have been getting Error: unexpected input in "sapply(seq_len(nrow(dtIhave)), \". Could you let me know what the \(i) does? It may be the issue and I may be able to fix it – Jantje Houten Mar 21 '22 at 08:42
  • @JantjeHouten Update your R or use `function(i)` instead of `\(i)`. – jay.sf Mar 21 '22 at 08:51
  • Hi jay.sf, I have updated the question by adding another example (id=3) for level is not NA. If I have level is not NA, then I realized that I have the problem where the counting begins on each dummyflag. How can I only consider counting the row after each dummyflag? Hope I am making sense, thank you. – Jantje Houten Mar 22 '22 at 08:26
  • @JantjeHouten Hi, please see updated answer. – jay.sf Mar 22 '22 at 08:50
  • Hi jay.sf, I am having the issue where I think the !is.na and dummflag has to work together for example in row 39 it should be 1 because there was a dummy 196 days ago. If we only look at when dummyflag == 0 then we will miss this case. Thank you – Jantje Houten Mar 22 '22 at 12:00
1

Solution:

dtIhave[, res:=fifelse(
  is.na(level),0,
  dtIhave[id==.BY$id & between(date,(.BY$date-365), .BY$date) & dummyflag==1, .N]),
  , by=.(id,date)
]

Explanation:

This uses one simple If-else statement (using data.table's fifelse()), but does so by each id and date group.

  • If the level is NA, then the result column is 0.
  • If the level is not NA, then we simply filter dtIhave down to rows with this id (id==.BY$id), and where the date is between this date minus 365 (.BY$date-365) and this date (.BY$date), and then we count those rows using .N.

The special .BY is available to use in j; it holds the values of the columns by in a list.. ie. .BY$id hold the value of id, and .BY$date holds the value of date for the current group

Output:

    id       date     level dummyflag dayssincedummy res
 1:  1 2014-12-05  6.831267        NA             NA   0
 2:  1 2015-01-23  7.167449        NA             NA   0
 3:  1 2015-03-06  6.500918        NA             NA   0
 4:  1 2015-05-15  7.267101        NA             NA   0
 5:  1 2015-08-06  6.463343        NA             NA   0
 6:  1 2015-10-29  7.685856        NA             NA   0
 7:  1 2016-01-21  6.465524        NA             NA   0
 8:  1 2016-04-06  7.602419        NA             NA   0
 9:  1 2016-07-11  7.339648        NA             NA   0
10:  1 2016-10-03  6.049635        NA             NA   0
11:  1 2016-11-11        NA         1             NA   0
12:  1 2016-12-07  6.639634        NA             26   1
13:  1 2017-10-25  7.951767        NA            348   1
14:  1 2018-01-09  5.444352        NA            424   0
15:  1 2018-02-12  8.972908        NA            458   0
16:  1 2018-07-04  7.084616        NA            600   0
17:  1 2018-11-30  5.602063        NA            749   0
18:  2 2014-05-14  7.120637        NA             NA   0
19:  2 2014-09-03  7.260747        NA             NA   0
20:  2 2014-09-04  7.676648        NA             NA   0
21:  2 2014-10-15        NA         1             NA   0
22:  2 2014-11-08        NA         1             24   0
23:  2 2014-12-05  7.000000        NA             27   2
24:  2 2014-12-18        NA         1             40   0
25:  2 2014-12-20        NA         1              2   0
26:  2 2014-12-23        NA         1              3   0
27:  2 2015-05-15  6.137783        NA            143   5
28:  2 2015-08-19  7.088102        NA            239   5
29:  2 2016-06-23  7.620440        NA            548   0
30:  2 2017-04-21 10.325672        NA            850   0
langtang
  • 22,248
  • 1
  • 12
  • 27