2

First time poster here, so apologies if I get something wrong.

I have a data table that looks like this currently:

dt_achieved<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya","mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"))

dt_achieved$intake_date<-as.Date(dt_achieved$intake_date)

I would like it to look like this, where the data are grouped by last_nm and then, if a row's intake date is at least 90 days after the group's initial one, it's flagged/its count increases by one on a rolling basis, within group.

dt_ideal<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya",
           "mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"),intake_round=c(1,1,1,1,1,2,3,3,1,1,2,2), Running_intake=c(1,1,1,2,3,4,5,6,1,1,2,3))

dt_ideal$intake_date<-as.Date(dt_ideal$intake_date)

I have gotten this far:

# order by last name and intake date (the real data is randomized)
setkeyv(dt_achieved, c("last_nm", "intake_date"))

dt_achieved[, intake_round := cumsum(c(TRUE, diff(as.Date(intake_date)) >= 90)), 
    .(last_nm)][, Running_intake := as.numeric(seq_len(.N)), .(last_nm)]

The issue is that the intake_round column currently reflects the difference from the date above rather than looking to the group's initial intake_date to see if it's at least 90 days later. ...I just can't figure out the script to get the cumsum(c(TRUE, ...) function to do this on a rolling basis, within group.

Have consulted a number of questions including:

incremental counter within dataframe only when a condition is met in r

Resetting the cumulative sum when a condition is met in R

Split into groups based on (multiple) conditions?

Cumulative sum that resets when the condition is no longer met

Please, any suggestions would be greatly, greatly appreciated!

MeowMix
  • 21
  • 5
  • 1
    I don't get the same `dt_ideal` result, so I might not be doing what you want, but is `dt_achieved[, cumsum(difftime(intake_date, intake_date[1], units="days") > 90) + 1, by=last_nm]` close? – thelatemail Mar 31 '22 at 00:53
  • Hello, @thelatemail! Yes, it worked perfectly: dt_ideal was made up, so your code worked splendidly, especially when layered with my other code. Wondering how I indicate/flag that you answered my question? – MeowMix Mar 31 '22 at 03:03
  • @thelatemail so I was looking over your recommendation once more (thank you for replying btw!!!), and it's so close to being the perfect solution but it's just a bit off. Your code appears to wait until 90 days has passed, but once it has then every instance in the group thereafter is counted. But I need the cumsum to reset to the next intake_date within group, as the head, and then again when another 90 days has passed for the entirety of the group. I think the key to this lies in the brackets. Am I wrong about this? intake_date[1]. Would something like [.N] convey through the group? – MeowMix Mar 31 '22 at 22:19
  • So after it hits 90 the counter goes back to 0 days difference as of the next value? – thelatemail Apr 01 '22 at 01:02
  • I think that's right! JIC, the best way to describe it is to ask you to take a look at the 4th-5th intake of degroat from dt_ideal. There are 90 days exactly between 08/09/21 and 11/09/21, so the cumsum advances one: 2->3. But, btw the 5th and 6th intake of degroat, there are only 29 days, so the cumsum does not advance one, it remains the same at 3. I think this might be a form of rolling, or perhaps splitting within a group based on a condition, but I might be using those terms inappropriately. Thank you once again for your brain power; I appreciate it, @thelatemail. – MeowMix Apr 01 '22 at 15:28
  • Hi, again! Just checking in to say, I figured it out!! By a combination of three techniques from the following posts: https://stackoverflow.com/questions/69889776/how-to-compare-within-a-group-the-first-value-to-each-subsequent-value-until-a-c?rq=1; https://stackoverflow.com/questions/68424641/incremental-counter-within-dataframe-only-when-a-condition-is-met-in-r; https://stackoverflow.com/questions/57223855/resetting-the-cumulative-sum-when-a-condition-is-met-in-r. As soon as I can figure out how to post code appropriately on here, I will upload my solution. Thanks again, @thelatemail! – MeowMix Apr 03 '22 at 04:42

0 Answers0