4

I would like to calculate the cumulative time difference from the start. I coded a rough solution, which I do not particularly like. Does somebody have a more elegant and reliable solution which can be used in dplyr pipe? The desires result shall be as in the diffCum column.

require(dplyr)

d = data.frame(n = 1:3, t = lubridate::ymd_hms("2020-03-30 08:15:39","2020-03-30 10:15:39","2020-03-30 14:15:39")) %>%
  mutate(diffMin = difftime(t, lag(t,1), unit = "mins")) %>%
  mutate(diffMin = ifelse(is.na(diffMin), 0, diffMin)) %>% # error prone as it would capture other NAs 
  mutate(diffCum = cumsum(diffMin)) # does not work with difftime class
Thorsten
  • 144
  • 1
  • 1
  • 10
  • The second `mutate` can be removed if you first use `lag(t, 1, default = t[1])`. After that, you can make it a single mutate (if you don't need `diffMin`) with `diffCum = cumsum(as.numeric(difftime(t, lag(t, 1, default = t[1]), unit = "mins")))`. – r2evans May 11 '20 at 14:50
  • 1
    (BTW: You should almost always use `library`, not `require`. The latter never stops following code when the package is not available, which is almost never what is intended. Refs: https://stackoverflow.com/a/51263513) – r2evans May 11 '20 at 14:55
  • Thanks r2evans. Basically your and John's answer are more or less similar. Would earn an "accept" as well. – Thorsten May 11 '20 at 15:09

1 Answers1

3

I'm unsure of what you mean by "capturing other NAs" and I'm also unsure if this qualifies as elegant!

d <- 
  data.frame(n = 1:3, t = lubridate::ymd_hms("2020-03-30 08:15:39","2020-03-30 10:15:39","2020-03-30 14:15:39")) %>%
  mutate(
    diffMin = difftime(t, lag(t,1, default = t[1] ), unit = "mins") %>% 
      as.numeric() %>%
      cumsum()
    )
John
  • 131
  • 5
  • While it's a good attempt, I've always cringed at nested `%>%`-pipes. \*shrug\* – r2evans May 11 '20 at 14:51
  • The `default = 0` does the trick for me. Interesting that it is not documented in the help. Thanks. – Thorsten May 11 '20 at 15:07
  • Thorsten, https://dplyr.tidyverse.org/reference/lead-lag.html shows "default" as an argument. Don't mistake it for `stats::lag`, though, which is unfortunately completely different. (Realize that `?lag` either (a) likely shows you `stats::lag`, or (b) prompts you to choose which package, `stats` or `dplyr` or `data.table`.) – r2evans May 11 '20 at 15:53
  • Isn't `default=0` *logically* wrong? I'd think that a running difference should always start at 0, but with `default=0` your first `diffMin` will be `26425936` (not 0). (Perhaps I misunderstand what you think `diffMin` should be.) – r2evans May 11 '20 at 15:54