-1

I'm trying to calculate the difference in days of my rows to create interval.

My data set called temp looks like this,

ID  Event
31933   11/12/2016
31933   11/14/2016
31750   09/04/2016
31750   09/10/2016
31750   09/30/2016
31750   10/01/2016
30995   09/04/2016
30995   09/09/2016
30995   09/10/2016
30995   9/24/2016

So my question is how can I calculate the difference between dates in day by ID? So for ID 31933 it is 2 days and for 31750 6, 20 and 1 days. I've tried several options which were given in other examples here, such as

library(zoo)
setDT(temp)
Interval<- function(x) difftime(x[3], x[1],units = "days")
temp[, INTERVAL := rollapply(Event, 3, diff, align = "left", fill = NA), by= ID]

The error here was "Type of RHS ('double') must match LHS ('logical'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)"

Also tried a few data.table functions but they did not work.

I'm quite new to R, so I suppose there is a simple solution.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Jenny
  • 29
  • 2

4 Answers4

4

With data.table and lubridate:

library(lubridate)
library(data.table)

setDT(df)[, Days := c(NA, diff(mdy(Event))), by=ID]

or:

setDT(df)[, Days := mdy(Event)-lag(mdy(Event)), by=ID]

Result:

       ID      Event    Days
 1: 31933 11/12/2016 NA days
 2: 31933 11/14/2016  2 days
 3: 31750 09/04/2016 NA days
 4: 31750 09/10/2016  6 days
 5: 31750 09/30/2016 20 days
 6: 31750 10/01/2016  1 days
 7: 30995 09/04/2016 NA days
 8: 30995 09/09/2016  5 days
 9: 30995 09/10/2016  1 days
10: 30995  9/24/2016 14 days

You can also try the following with dplyr and lubridate:

library(lubridate)
library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(Event = mdy(Event),
         Days = Event - lag(Event))

Result:

# A tibble: 10 x 3
# Groups:   ID [3]
      ID      Event    Days
   <int>     <date>  <time>
 1 31933 2016-11-12 NA days
 2 31933 2016-11-14  2 days
 3 31750 2016-09-04 NA days
 4 31750 2016-09-10  6 days
 5 31750 2016-09-30 20 days
 6 31750 2016-10-01  1 days
 7 30995 2016-09-04 NA days
 8 30995 2016-09-09  5 days
 9 30995 2016-09-10  1 days
10 30995 2016-09-24 14 days

Or if you prefer to remove the NA rows:

df %>%
  group_by(ID) %>%
  mutate(Event = mdy(Event),
         Days = Event - lag(Event)) %>%
  filter(Days > 0)

Result:

# A tibble: 7 x 3
# Groups:   ID [3]
     ID      Event    Days
  <int>     <date>  <time>
1 31933 2016-11-14  2 days
2 31750 2016-09-10  6 days
3 31750 2016-09-30 20 days
4 31750 2016-10-01  1 days
5 30995 2016-09-09  5 days
6 30995 2016-09-10  1 days
7 30995 2016-09-24 14 days

Data:

df = structure(list(ID = c(31933L, 31933L, 31750L, 31750L, 31750L, 
31750L, 30995L, 30995L, 30995L, 30995L), Event = structure(c(6L, 
7L, 1L, 3L, 4L, 5L, 1L, 2L, 3L, 8L), .Label = c("09/04/2016", 
"09/09/2016", "09/10/2016", "09/30/2016", "10/01/2016", "11/12/2016", 
"11/14/2016", "9/24/2016"), class = "factor")), .Names = c("ID", 
"Event"), class = "data.frame", row.names = c(NA, -10L))
acylam
  • 18,231
  • 5
  • 36
  • 45
1

There are several problems:

  • the dates should be of "Date" class, not "character" class

  • in R, NA is logical. An NA of type double is written NA_real_ Often it does not matter but in this case it matters due to the way data.table works.

  • if you indent your code 4 spaces then SO will format it for you

  • the desired output is not shown in the question but from the code it is asking for the difference between every other row. We show both the solution for every other row but if you wanted successive rows just replace 2 with 1 in each solution.

Using the above we write it like this:

library(data.table)
library(zoo) 

setDT(temp) 
temp$Event <- as.Date(temp$Event, "%m/%d/%Y")

roll <- function(x, k) rollapply(x, k+1, diff, lag = k, align = "left", fill = NA_real_)
temp[, INTERVAL := roll(as.numeric(Event), 2), by = ID]

giving for the every other row case:

> temp
       ID      Event INTERVAL
 1: 31933 2016-11-12       NA
 2: 31933 2016-11-14       NA
 3: 31750 2016-09-04       26
 4: 31750 2016-09-10       21
 5: 31750 2016-09-30       NA
 6: 31750 2016-10-01       NA
 7: 30995 2016-09-04        6
 8: 30995 2016-09-09       15
 9: 30995 2016-09-10       NA
10: 30995 2016-09-24       NA

This alternative using data.table's shift could also be used and only requires data.table:

temp[, INTERVAL := as.numeric(shift(Event, 2, type = "lead") - Event), by = ID]

If you had intended successive rows rather than every other row replace 2 in either of the above solutions with 1.

Note

The input in reproducible form is:

Lines <- "ID Event 
31933 11/12/2016 
31933 11/14/2016 
31750 09/04/2016 
31750 09/10/2016 
31750 09/30/2016 
31750 10/01/2016 
30995 09/04/2016 
30995 09/09/2016 
30995 09/10/2016 
30995 09/24/2016"
temp <- read.table(text = Lines, header = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This addresses the errors that OP was having, but it doesn't give the desired output. – acylam Mar 21 '18 at 13:51
  • The poster of the question did not specify the desired output but from the code that was posted in question I think that it is what was desired. – G. Grothendieck Mar 21 '18 at 13:55
  • It might not be obvious, but I think OP did: _"how can I calculate the difference between dates in day by ID? So for ID 31933 it is 2 days and for 31750 6, 20 and 1 days"_ – acylam Mar 21 '18 at 13:56
  • Yes, but it does not say successive dates but just dates and the code clarifies it to be the date on every other row. In any event, in the absence of output which the question should have shown we don't know for sure and I have included both situations now just in case. – G. Grothendieck Mar 21 '18 at 14:29
0

The date class is stored in a format that measures dates by days, so you can perform simple arthimetic with them, as per this SO thread.

It uses a YYYY/MM/DD format. For example

abs(as.Date("2016/11/12") - as.Date("2016/11/14"))
Time difference of 2 days

If you reformat your dates to YYYY/MM/DD, you should be able to use, for example, abs(temp[1, 2] - temp[2, 2]) to determine the difference between the dates in the first two rows.

Community
  • 1
  • 1
MBorg
  • 1,345
  • 2
  • 19
  • 38
0

Great thanks for all your suggestions. I figured it out.

temp<- data.table(ID,Event, key = c("ID", "Event"))
temp[,INTER := c(0,'units<-'(diff(Event), "days")),by= ID]

and then merged it with my dataset. Suppose its not very elegant but it worked.

Uwe
  • 41,420
  • 11
  • 90
  • 134
Jenny
  • 29
  • 2