-2

I'd really appreciate some help on a problem I'm struggling to resolve in R.

I have a data frame with a series of IDs, dates and treatments. My end goal is to count the number of events that happen to an ID by treatment within a given timeframe.

For example,

ID has treatment A within twice within the space of three months, and four times within six months. I expect to have a series of conditional columns which count the number of occurrences.

The data frame follows a similar structure to:

ID  date        treatment  
1A   20/09/2015  A
1A   21/09/2015  B
1A   22/10/2015  A
2A   22/09/2015  C
2A   20/10/2015  C

My end goal would be to have something like...

ID  date        treatment    
1A  01/01/2016  A     
1A  01/03/2016  A        
1A  01/04/2016  A
1A  01/05/2016  A
1A  01/11/2016  A
2A  01/01/2016  A
2A  01/09/2016  A

Grouping to...

ID      a_within_3_months  a_within_6_months...
1A      3                  1
2A      0                  0

I'm sure this must be possible in data.table, but I'm struggling to figure out how to calculate this over rows by the conditions I want.

I hope this is clear - happy to provide more detail is helpful.

Really appreciate any help with this issue! Thank you for your time.

shbfy
  • 2,075
  • 3
  • 16
  • 37
  • not clear to me what "within 3 months" means - is it max within any 3 consecutive months, or is it from first date or ...? – eddi Sep 30 '16 at 21:53
  • 1
    I think you'll need to create a reproducible example. Some guidance: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Sep 30 '16 at 21:54
  • Thanks For your response Eddi. So in that case I would like to count how many times the ID has another treatment within three months from the first date – shbfy Sep 30 '16 at 21:55
  • Thanks Frank, I'll take a look and try and create this. – shbfy Sep 30 '16 at 21:59
  • When replying to a comment, please use `@` to notify the person you're responding to; so e.g. your last comments should have included `@Eddi` and `@Frank` so that those users would receive notifications. It's very probable that those users never saw your last two comments. See [here](http://meta.stackoverflow.com/q/335607/1858225) (which is how I found this question). – Kyle Strand Oct 04 '16 at 20:16

1 Answers1

0

This might be what you are looking for:

> first_date <- as.Date(
         as.character(20140612),
          "%Y%m%d")
> data<- data.frame(
       ID=c(rep(1,5), rep(2,5)), 
        date=seq(first_date, by="1 day", length.out=10),
        trtm=c(rep("a",3), rep("b",2), rep("c",3), rep("d",2)))

data
ID       date trtm
1 2014-06-12    a
1 2014-06-13    a
1 2014-06-14    a
1 2014-06-15    b
1 2014-06-16    b
2 2014-06-17    c
2 2014-06-18    c
2 2014-06-19    c
2 2014-06-20    d
2 2014-06-21    d

> data <- data.table(data)

> data[,.( within=max(date)-min(date),       
           n_of_trtm=length(date) ), 
        by=.(ID,trtm)]

    ID trtm within n_of_trtm
 1:  1    a 2 days         3
 2:  1    b 1 days         2
 3:  2    c 2 days         3
 4:  2    d 1 days         2
cecconeurale
  • 108
  • 6