0

I'm trying to use diff() to find the time difference between consecutive observations on the same day using dplyr. If I calculate the difference between all observations (column t1) the units are consistent (in this case minutes). If I group by day (column t2) the units are hours for the first day, then minutes for the second. This results in a column in inconsistent units.

library(dplyr)

testdata <- data.frame(Day = structure(c(rep(11549:11550,c(5,15))), class = "Date"),
                       Time = structure(c(997878000, 997883400, 997897200, 997906500, 997913100, 
                                          997919400, 997924200, 997928700, 997934100, 997939200, 997944900, 
                                          997951500, 997957500, 997961700, 997965900, 997969500, 997972800, 
                                          997976100, 997981500, 997990500), 
                                        class = c("POSIXct", "POSIXt"), tzone = ""))


testdata %>% mutate(t1 = c(NA, diff(Time)) )  %>%
  group_by(Day) %>% mutate(t2 = c(NA, diff(Time)) )

## A tibble: 20 x 4
## Groups:   Day [2]
#   Day        Time                   t1     t2
#   <date>     <dttm>              <dbl>  <dbl>
# 1 2001-08-15 2001-08-15 13:20:00    NA  NA   
# 2 2001-08-15 2001-08-15 14:50:00    90   1.5 
# 3 2001-08-15 2001-08-15 18:40:00   230   3.83
# 4 2001-08-15 2001-08-15 21:15:00   155   2.58
# 5 2001-08-15 2001-08-15 23:05:00   110   1.83
# 6 2001-08-16 2001-08-16 00:50:00   105  NA   
# 7 2001-08-16 2001-08-16 02:10:00    80  80   
# 8 2001-08-16 2001-08-16 03:25:00    75  75   
# 9 2001-08-16 2001-08-16 04:55:00    90  90   
#10 2001-08-16 2001-08-16 06:20:00    85  85  
#...

The question is lined to this and this question but seems sufficiently different to make a new question. Using this avoids the problem by using difftime instead of diff.

Miff
  • 7,486
  • 20
  • 20

1 Answers1

0

The function diff() returns different units depending on the data it is provided with, although I can't see any documentation that indicates what criteria it uses for determining the units. If the function is called explicitly, the units are indicated, e.g.

diff(testdata$Time[1:5])
# Time differences in hours
# [1] 1.500000 3.833333 2.583333 1.833333

When the function is called from within mutate this output is suppressed, and the class difftime gets coerced silently to numeric. One way around this is to define a function that makes the conversion using an explicit unit, e.g.

mydiff <- function(x){y <- diff(x); units(y) <- "mins"; y}

testdata %>% mutate(t1 = c(NA, mydiff(Time)) )  %>%
  group_by(Day) %>% mutate(t2 = c(NA, mydiff(Time)) )


# # A tibble: 20 x 4
# # Groups:   Day [2]
#   Day        Time                   t1    t2
#   <date>     <dttm>              <dbl> <dbl>
# 1 2001-08-15 2001-08-15 13:20:00    NA    NA
# 2 2001-08-15 2001-08-15 14:50:00    90    90
# 3 2001-08-15 2001-08-15 18:40:00   230   230
# 4 2001-08-15 2001-08-15 21:15:00   155   155
# 5 2001-08-15 2001-08-15 23:05:00   110   110
# 6 2001-08-16 2001-08-16 00:50:00   105    NA
# 7 2001-08-16 2001-08-16 02:10:00    80    80
# 8 2001-08-16 2001-08-16 03:25:00    75    75
# 9 2001-08-16 2001-08-16 04:55:00    90    90
#10 2001-08-16 2001-08-16 06:20:00    85    85
#...
Miff
  • 7,486
  • 20
  • 20