0

I would like to calculate the date difference between each entry. The data looks like this

> dt <- data.table(id = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), date = as.Date(c("2017-01-01", "2017-02-01", "2017-05-01", "2017-01-01", "2017-05-01", "2017-10-01", "2017-01-01", "2017-02-01", "2017-02-15")))
> dt
   id       date
1:  A 2017-01-01
2:  A 2017-02-01
3:  A 2017-05-01
4:  B 2017-01-01
5:  B 2017-05-01
6:  B 2017-10-01
7:  C 2017-01-01
8:  C 2017-02-01
9:  C 2017-02-15

and what I want to get is sth looks like following, how should I construct var "Diff"?

enter image description here

Update:

I tried to solve this by using following codes:

> dt <- data.table(id = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), date = as.Date(c("2017-01-01", "2017-02-01", "2017-05-01", "2017-01-01", "2017-05-01", "2017-10-01", "2017-01-01", "2017-02-01", "2017-02-15")))
> dt %>% 
+     group_by(id) %>% 
+     mutate(diff = date - lag(date))
# A tibble: 9 x 3
# Groups:   id [3]
  id    date       diff     
  <chr> <date>     <drtn>   
1 A     2017-01-01   NA days
2 A     2017-02-01   31 days
3 A     2017-05-01   89 days
4 B     2017-01-01 -120 days
5 B     2017-05-01  120 days
6 B     2017-10-01  153 days
7 C     2017-01-01 -273 days
8 C     2017-02-01   31 days
9 C     2017-02-15   14 days

I am not sure what I did wrong. Any idea?

Stataq
  • 2,237
  • 6
  • 14

4 Answers4

1

how about this:

dt$diff <- unlist(lapply(split(dt,dt$id), function(x) c(0,diff(x$date)) ))

output:

> dt 
   id       date diff
1:  A 2017-01-01    0
2:  A 2017-02-01   31
3:  A 2017-05-01   89
4:  B 2017-01-01    0
5:  B 2017-05-01  120
6:  B 2017-10-01  153
7:  C 2017-01-01    0
8:  C 2017-02-01   31
9:  C 2017-02-15   14
Daniel O
  • 4,258
  • 6
  • 20
  • This is a very clever way. i did not know that before you show me. Many thanks. If I want to get the max(diff) by id, What should I do? Please guide me. Thanks in advance. :-) – Stataq Jun 09 '20 at 13:41
  • This is the code I tried"dt$Duration <- unlist(lapply(split(dt,dt$id), function(x) c(max(x$diff)) ))" – Stataq Jun 09 '20 at 13:53
  • you just want the max diff? `lapply(split(dt,dt$id), function(x) max(diff(x$date)))` , you can change `lapply` to `sapply` for a different output. – Daniel O Jun 09 '20 at 13:57
  • Thanks so much for the prompt reply. Itried following codes and still get the error codes. what should i do in order to add variable to each record with max diff info :dt$max <- unlist(lapply(split(dt,dt$id), function(x) max(diff(x$date)))) Error in set(x, j = name, value = value) : Supplied 3 items to be assigned to 9 items of column 'max'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code. – Stataq Jun 09 '20 at 14:08
  • 1
    you'll want to repeat the number by the correct amount for each id. `unlist(lapply(split(dt,dt$id), function(x) rep(max(diff(x$date)),nrow(x))))` – Daniel O Jun 09 '20 at 14:13
  • Thanks so much for your guidance. Your kindness and patience are highly appreciated. I am a beginner(just start learning r for 4 weeks) and was worried about asking too many questions. Can't say enough thanks to you for your kindness. Yes. It works! – Stataq Jun 09 '20 at 14:25
0

Maybe not elegant by maybe performant using diff on all and overwrite between the id.

dt$diff <- c(NA, diff(dt$date))
dt$diff[c(TRUE, dt$id[-1] != dt$id[-nrow(dt)])] <- NA
dt
#  id       date diff
#1  A 2017-01-01   NA
#2  A 2017-02-01   31
#3  A 2017-05-01   89
#4  B 2017-01-01   NA
#5  B 2017-05-01  120
#6  B 2017-10-01  153
#7  C 2017-01-01   NA
#8  C 2017-02-01   31
#9  C 2017-02-15   14
GKi
  • 37,245
  • 2
  • 26
  • 48
0

Tidyverse solution:

library(tidyverse)
dt %>% 
  group_by(id) %>% 
  mutate(diff = date - lag(date))

  id    date       diff    
  <chr> <date>     <drtn>  
1 A     2017-01-01  NA days
2 A     2017-02-01  31 days
3 A     2017-05-01  89 days
4 B     2017-01-01  NA days
5 B     2017-05-01 120 days
6 B     2017-10-01 153 days
7 C     2017-01-01  NA days
8 C     2017-02-01  31 days
9 C     2017-02-15  14 days
Ahorn
  • 3,686
  • 1
  • 10
  • 17
  • I thought of this ,but for some reason I could not get the same results as yours. Could you take a look of my codes and see what I did wrong? Please see my updated post for detail. Many thanks. – Stataq Jun 08 '20 at 19:25
  • Did you maybe load `plyr` after `dplyr`? This should work: `dt %>% dplyr::group_by(id) %>% dplyr::mutate(diff = date - lag(date))` – Ahorn Jun 08 '20 at 19:53
  • You're welcome! Cool, glad it works! – Ahorn Jun 08 '20 at 21:10
0

An option with data.table

library(data.table)
setDT(dt)[, diff := date - shift(date), id]
akrun
  • 874,273
  • 37
  • 540
  • 662