Count rows in the past year according to a condition using data.table

Question

According to this previous post, I can add a column with a count of occurrences in the past year in the following way:

df[, boundary := date - 365]
df[, counts := df[df, .N,  on = .(id, date < date, date > boundary), by = .EACHI]$N]

This works fine for me. However, I want to do this by counting only the number of occurrences where another column has a specific value. For example, given a dataset like this

id  type  date
ny  0     2021-09-27   
ny  0     2021-09-09
ny  1     2021-08-01
ny  1     2021-07-07
ch  0     2020-04-01
ch  1     2020-03-01 
ch  0     2020-02-01

I want to count only the number of rows where type = 1. How can I amend the function above to do this? I tried something like this, but it doesn't work:

df[, counts := df[df, .N(type = 1),  on = .(id, date < date, date > boundary), by = .EACHI]$N]

EDIT: Expected output for the above dataset would be:

id  type  date         counts
ny  0     2021-09-27   2
ny  0     2021-09-09   2
ny  1     2021-08-01   1  
ny  1     2021-07-07   0  
ch  0     2020-04-01   1
ch  1     2020-03-01   0 
ch  0     2020-02-01   0

`date < date` doesn't make any sense since this is always `FALSE`. — Martin Gal, Oct 19 '21 at 08:52
Hmm, it does work for me. I did it in the same way as the accepted post [here](https://stackoverflow.com/questions/42766130/data-frame-how-to-add-a-column-to-count-previous-occurrences-in-the-last-20-wee) — fifigoblin, Oct 19 '21 at 08:54
@RonakShah No, it doesn't, it throws an error that the lengths do not match when joining. — fifigoblin, Oct 19 '21 at 09:00
@RonakShah `Supplied 1511 items to be assigned to 1079 items of column 'counts'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.` — fifigoblin, Oct 19 '21 at 09:04
Hi @RonakShah I added more rows to the datasets in the post, hopefully this explains it. — fifigoblin, Oct 19 '21 at 09:35

score 2 · Accepted Answer · answered Oct 19 '21 at 09:40

You may calculate sum(type == 1) instead of .N.

setDT(df)
df[, boundary := date - 365]

df[, counts := df[df, sum(type == 1),  
      on = .(id, date < date, date > boundary), by = .EACHI]$V1]
df[is.na(counts), counts := 0]
df

#   id type       date   boundary counts
#1: ny    0 2021-09-27 2020-09-27      2
#2: ny    0 2021-09-09 2020-09-09      2
#3: ny    1 2021-08-01 2020-08-01      1
#4: ny    1 2021-07-07 2020-07-07      0
#5: ch    0 2020-04-01 2019-04-02      1
#6: ch    1 2020-03-01 2019-03-02      0
#7: ch    0 2020-02-01 2019-02-01      0

Exactly this - thank you :) – fifigoblin Oct 19 '21 at 09:56 — fifigoblin, Oct 19 '21 at 09:56

Count rows in the past year according to a condition using data.table

1 Answers1