Cumulative time with reset

Question

I have a dataset that looks like this:

id      land    datetime
pb1     0       2004-04-05 01:44:00
pb1     1       2004-04-05 02:00:00
pb1     1       2004-04-05 16:00:00 
pb2     1       2004-04-05 18:01:00 
pb2     1       2004-04-05 20:00:00   

library(data.table) 
DT = data.table(
  id = c("pb1", "pb1", "pb1", "pb2", "pb2"), 
  land = c(0L, 1L, 1L, 1L, 1L), 
  datetime = sprintf("2004-04-05 %02d:%02d:00", 
                     c(1, 2, 16, 18, 20), 
                     c(44, 0, 0, 1, 0))
)

I would like to make a column that cumulatively adds time (in days) but ONLY if there is a '1' in the land column. I also would like the count to reset when the id changes.

I have tried a variety of methods using data.table, rleid, and even a nested for loop with no success. I have gotten errors using code like this:

DT[, total :=land*diff(as.numeric(datetime)), .(id, rleid(land))]

I have tried variations of the solution here: Calculating cumulative time in R

I'm not sure the best way to calculate the time interval (no success with difftime or lubridate).

I want the end result to look like this:

id      land           datetime         cumtime.land
pb1     0       2004-04-05 01:44:00     0
pb1     1       2004-04-05 02:00:00     0
pb1     1       2004-04-06 16:00:00     1.58333
pb2     1       2004-04-05 18:00:00     0
pb2     1       2004-04-05 20:00:00     0.08333

Are you looking for this: `all[, total := c(0, land * diff(as.numeric(datetime) / 86400)), by = .(id, rleid(land))]` ? — Jaap, Dec 20 '18 at 20:22
btw: don't use function names as object names, it might confuse R; see also `?all` — Jaap, Dec 20 '18 at 20:25
Could you add one more line of `pb1` in your example ? I'm not sure about the exact nature of your question. — Dan Chaltiel, Dec 21 '18 at 12:13
the `datetime` result in your "desired results" doesn't match that in your initial input... — MichaelChirico, Dec 21 '18 at 23:11

Dan Chaltiel · Accepted Answer · 2019-01-09T08:27:10.270

I could not replicate @Japp's comment, but you can easily do this with dplyr.

Depending on what your exact expected output is, you could stop before the summarize call:

library(dplyr)
df=read.table(text=
    "id      land    datetime
    pb1     0       '2004-04-05 01:44:00'
    pb1     1       '2004-04-05 02:00:00'
    pb1     1       '2004-04-06 16:00:00'
    pb1     1       '2004-04-07 16:00:00'
    pb2     1       '2004-04-05 18:00:00' 
    pb2     1       '2004-04-05 20:00:00'", header=T) %>% 
  mutate(datetime=as.POSIXct(datetime,format='%Y-%m-%d %H:%M:%S'))

x = df %>% 
  group_by(id) %>% 
  arrange(id, datetime) %>% 
  mutate(time.land=ifelse(land==0 | is.na(lag(land)) | lag(land)==0, 
                             0,
                             difftime(datetime, lag(datetime), units="days"))) %>% 
  mutate(cumtime.land=time.land + ifelse(is.na(lag(time.land)), 0, lag(time.land)))

  id     land datetime            time.land cumtime.land
  <fct> <int> <dttm>                  <dbl>        <dbl>
1 pb1       0 2004-04-05 01:44:00    0            0     
2 pb1       1 2004-04-05 02:00:00    0            0     
3 pb1       1 2004-04-06 16:00:00    1.58         1.58  
4 pb1       1 2004-04-07 16:00:00    1            2.58  
5 pb2       1 2004-04-05 18:00:00    0            0     
6 pb2       1 2004-04-05 20:00:00    0.0833       0.0833

The key is to use the dplyr::lag() function which takes the "line just above" in the table (which implies that you have to arrange() it beforehand).

By wrapping this inside the ifelse, I'm checking that land and previous land were not 0 (and that we are not in the first line of the id, or lag(anything) will be missing).

I then just reuse the lag() function to get the cumtime.land variable.

Thank you, @DanChaltiel! This is SUPER close to what I want. The only thing is I'd like the cumtime.land column to be cumulative. For example, in x[4,4] I'd like to see the value 2.58 instead of 1. — NorthLattitude, Jan 08 '19 at 18:38

MichaelChirico · Answer 2 · 2018-12-21T23:22:15.007

I believe you're after:

DT[land == 1, cumtime.land = 
     cumsum(c(0, diff(as.numeric(datetime))))/86400, by = id]

as.numeric(datetime) converts it to seconds so we use 86400 to convert to days.

Somewhat more "official" in the sense of leveraging time/date classes directly is to use difftime and shift:

DT[land == 1, by = id,
   cumtime.land := 
     cumsum(as.double(difftime(
       datetime, shift(datetime, fill = datetime[1L]), units = 'days'
     )))]

I switched the order of the by argument simply to help with formatting.

We use datetime[1L] to fill so that the initial difference is 0; we need as.double because cumsum errors as it's not confident how to deal with difftime objects as input.

Cumulative time with reset

2 Answers2

Linked