2

example data:

test <- structure(list(date1 = structure(c(1632745800, 1632745800), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), date2 = structure(c(1641468180, 1641468180), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame"))

Is there a reason why the output of difftime differs based on whether the inputs are wrapped by as.character or not? For example:

library(tidyverse)

test <- structure(list(date1 = structure(c(1632745800, 1632745800), 
                                         tzone = "UTC", class = c("POSIXct", "POSIXt")), 
                       date2 = structure(c(1641468180, 1641468180), tzone = "UTC", class = c("POSIXct", "POSIXt"))), 
                  row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))

test %>% mutate(date_diff = difftime(date2, date1, units = "days"), 
date_diff2 = difftime(as.character(date2), as.character(date1), units = "days")) %>% 
  print.data.frame()
#>                 date1               date2     date_diff    date_diff2
#> 1 2021-09-27 12:30:00 2022-01-06 11:23:00 100.9535 days 100.9951 days
#> 2 2021-09-27 12:30:00 2022-01-06 11:23:00 100.9535 days 100.9951 days

It only differs by ~0.04 in this case, but is there a reason why? And which one would be considered correct? Thank you!

vizidea
  • 153
  • 7
  • also noting that using the lubridate package, `time_length(interval(date1, date2), "days")` will give the same output as `date_diff`. Therefore, I'm assuming without as.character() is correct, but I'm still not sure why the outputs are different – vizidea Jun 08 '23 at 22:13
  • [@vizidea](https://stackoverflow.com/users/14908133/vizidea) It is due to on lubridate::interval(), it uses default argument tzone = tz(start) to set the timezone . since the structure contains "UTC" it recognizes it and returns the same results. See the document reference [here](https://lubridate.tidyverse.org/reference/interval.html) – Abdullah Faqih Jun 08 '23 at 22:40

2 Answers2

3

The conversion to character is lossy because you lose the time zone infromation. Your original datetimes are specified to be in UTC. If you use as.character() and reparse them, they get interpreted as your local time, where it seems like one of the dates uses daylight savings and the other does not, resulting in an additional one hour difference.

x <- as.POSIXct(1632745800, tz = "UTC")
y <- as.POSIXct(1641468180, tz = "UTC")

x
#> [1] "2021-09-27 12:30:00 UTC"
as.character(x)
#> [1] "2021-09-27 12:30:00"
as.POSIXct(as.character(x))
#> [1] "2021-09-27 12:30:00 BST"
as.POSIXct(as.character(y))
#> [1] "2022-01-06 11:23:00 GMT"
Mikko Marttila
  • 10,972
  • 18
  • 31
1

It's due to the macine local-specific time of the as.POSIXct usage when converting from the string. Using the original datetime object is desired.

Abdullah Faqih
  • 116
  • 1
  • 7