I'm trying to use diff()
to find the time difference between consecutive observations on the same day using dplyr
. If I calculate the difference between all observations (column t1) the units are consistent (in this case minutes). If I group by day (column t2) the units are hours for the first day, then minutes for the second. This results in a column in inconsistent units.
library(dplyr)
testdata <- data.frame(Day = structure(c(rep(11549:11550,c(5,15))), class = "Date"),
Time = structure(c(997878000, 997883400, 997897200, 997906500, 997913100,
997919400, 997924200, 997928700, 997934100, 997939200, 997944900,
997951500, 997957500, 997961700, 997965900, 997969500, 997972800,
997976100, 997981500, 997990500),
class = c("POSIXct", "POSIXt"), tzone = ""))
testdata %>% mutate(t1 = c(NA, diff(Time)) ) %>%
group_by(Day) %>% mutate(t2 = c(NA, diff(Time)) )
## A tibble: 20 x 4
## Groups: Day [2]
# Day Time t1 t2
# <date> <dttm> <dbl> <dbl>
# 1 2001-08-15 2001-08-15 13:20:00 NA NA
# 2 2001-08-15 2001-08-15 14:50:00 90 1.5
# 3 2001-08-15 2001-08-15 18:40:00 230 3.83
# 4 2001-08-15 2001-08-15 21:15:00 155 2.58
# 5 2001-08-15 2001-08-15 23:05:00 110 1.83
# 6 2001-08-16 2001-08-16 00:50:00 105 NA
# 7 2001-08-16 2001-08-16 02:10:00 80 80
# 8 2001-08-16 2001-08-16 03:25:00 75 75
# 9 2001-08-16 2001-08-16 04:55:00 90 90
#10 2001-08-16 2001-08-16 06:20:00 85 85
#...
The question is lined to this and this question but seems sufficiently different to make a new question. Using this avoids the problem by using difftime
instead of diff
.