Issues in writing and reading datetime in R

Question

Below is an example, where I calculate travel time between to points. I create a tibble with start time, duration, end_time and time_diff = end_time - start_time. I wrote the tibble using write_csv() and read it again with read_csv()

library(sf)
library(tidygeocoder)
library(osrm)
library(lubridate)

# 1. One World Trade Center, NYC
# 2. Madison Square Park, NYC
adresses <- c("285 Fulton St, New York, NY 10007", 
              "11 Madison Ave, New York, NY 10010")

# geocode the two addresses & transform to {sf} data structure
data <- tidygeocoder::geo(adresses, method = "osm") %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326)

rownames(data) <- c("One World Trafe Center", "Madison Square Park")

# calculate travel time from "One World Trade Center" to "Madison Square Park"
osroute <- osrmTable(src = data["One World Trade Center", ],
                     dst = data["Madison Square Park", ])

tbl_out <- tibble(start_trip = ymd_hms("2018-01-01 08:00:00", tz = "America/New_York"), 
              duration = osroute$durations,
              end_trip = start_trip + 60 * osroute$durations,
              time_diff = difftime(end_trip, start_trip, units = "mins")

write.csv(tbl_out, 
          "sample.csv")

tbl_in <- read_csv("sample.csv")

Here is the screenshot of tbl_out (used to write the data)

Here is the screenshot of tbl_in (read using read_csv)

Can someone help me fix some issues -

Datetime changed from EST to UTC after writing into csv and then reading from csv file
Time_diff and duration have same values. But time_diff is better readable and how can I add unit mins to duration. I don't want to convert it character and paste min.
Is there a way to convert time duration and time_diff to HH:MM and still perform basic operation like addition
Follow up on 3. In case we convert time to HH:MM. Can I write 74.5 mins as 74:30.
time_diff in tbl_in after reading from csv file does not have min

score 1 · Answer 1 · answered Aug 27 '21 at 19:09

The time's timezone is not written to file when you write.csv, so when reading it in there is no clue what TZ it should be. I suggest you always write in one timezone (e.g., "UTC"), and then explicitly cast it when you read it in. You can do this with attr<-:
```
now <- Sys.time()
now
# [1] "2021-08-27 14:09:59 EDT"
attr(now, "tzone") <- "UTC"
now
# [1] "2021-08-27 18:09:59 UTC"
```

What you're seeing is a "difftime"-class object. You can convert any numeric vector into this; all that that class does is change the print method used for it.

times <- c(10.4, 19)
times + 1
# [1] 11.4 20.0
times <- structure(times, class="difftime", units="mins")
times
# Time differences in mins
# [1] 10.4 19.0
times + 1
# Time differences in mins
# [1] 11.4 20.0

One implementation could be to use data.table::as.ITime:

times <- c(10.41, 19.01)
as.ITime(60*times) # ITime assumes seconds for all numbers
# [1] "00:10:24" "00:19:00"
as.ITime(60*times) + 60
# [1] "00:11:24" "00:20:00"

Frankly, I think that may be the best way-forward, even if it gives you HH:MM:SS instead of just HH:MM (perhaps there's a way to customize that ... I don't know offhand). One benefit of this method is that it writes to CSVs as the HH:MM:SS format (which may or may not be perfect):

write.csv(data.frame(x = as.ITime(60*times)), "foo.csv")
readLines("foo.csv")
# [1] "\"\",\"V1\""    "\"1\",00:10:24" "\"2\",00:19:00"

However, if that doesn't fit your needs, then below is a hack that might be sufficient for your needs. (Note that this, when saved to a CSV, is not saved in that format, it is saved as a number.)

format.my_difftime <- function(x, ..., digits = getOption("digits.my_difftime", 0)) {
  if (is.null(digits)) digits <- getOption("digits.my_difftime", 0) # idk why this is needed
  units <- attr(x, "units")
  if (!is.null(units)) {
    mult <- switch(units,
                   sec=, secs=1/60,
                   min=, mins=1,
                   hour=, hours=60,
                   day=, days=86400,
                   NA)
    if (is.na(mult)) {
      warning("Unrecognized units, ignoring: ", sQuote(units, q = FALSE))
      mult <- 1
    }
    x <- x * mult
  } # else assume 'mins'
  fmt <- paste0("%02i:%0", digits+2+(digits>0), ".0", digits, "f")
  sprintf(fmt, as.integer(x), 60 * (x %% 1))
}
print.my_difftime <- function(x, ...) cat(format(x), "\n")
as.data.frame.my_difftime <- as.data.frame.difftime
units.my_difftime <- function(x) attr(x, "units")
`units<-.my_difftime` <- function(x, value) {
  attr(x, "units") <- value
  x
}

Demonstration:

times <- c(10.41, 19.01)
structure(times, class = "my_difftime", units = "sec")
# 00:10 00:19 
structure(times, class = "my_difftime", units = "min")
# 10:25 19:01 
structure(times, class = "my_difftime", units = "hour")
# 624:36 1140:36 

options(digits.my_difftime = 3)
times <- structure(times, class = "my_difftime", units = "min")
times
# 10:24.600 19:00.600 
data.frame(x = times)
#           x
# 1 10:24.600
# 2 19:00.600
options(digits.my_difftime = 0)
data.frame(x = times)
#       x
# 1 10:25
# 2 19:01
dput(data.frame(x = times))
# structure(list(x = structure(c(10.41, 19.01), class = "my_difftime", units = "min")), class = "data.frame", row.names = c(NA, -2L))

(Resolved in 3.)
Similar to #1, write.csv does not include units when it writes to a file; in fact, if it did, then read.csv (and most other CSV-reading functions) might presume that the column is character instead.
```
write.csv(data.frame(x = times), "foo.csv") # using my_difftime, not ITime
readLines("foo.csv")
# [1] "\"\",\"x\""  "\"1\",10.41" "\"2\",19.01"
```
The numbers are preserved.

In the case of this naïve implementation for my_difftime, though, the numbers never change, so the "units" attribute is merely for presentation. That is, when you calculate times, make sure that its units are always "minutes" (in my assumption of the OP/question) or something known.

From there, for TZ and for my_difftime, once somebody read.csv's the file, they are responsible for properly classing the column.

Thank your very much. I would appreciate if you can work on my dataset to show the desired changes. For ex - `attr(now, "tzone") <- "UTC"` how do I use it with tibble with multiple datetime columns — SiH, Aug 27 '21 at 20:55
(1) What dataset? You provided images, and there are several reasons why that doesn't work well. (2) You can do that for each column. Do you have so many columns that that doesn't work well? Since you're using `tibble`s, I'm inferring you're using `dplyr` as well, in which case `mutate(across(c(column1,column3), ~ attr(., "tzone") <- "UTC"))` should work, replacing `column1` etc with your real `POSIXt` column names (not in quotes). — r2evans, Aug 27 '21 at 21:18
Several links are worth pointing out to you: https://meta.stackoverflow.com/a/285557 (images of code/data); and I think you being here a while, you may have read these already, but it not: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info are good links for crafting a question as "reproducible" as possible. Thanks! — r2evans, Aug 27 '21 at 21:19
(I'll add that your question has nothing to do with `tidygeocoder`, so it doesn't make sense for us to install it just to get sample data. The question should only contain code relevant to the issue. Thanks for understanding.) — r2evans, Aug 28 '21 at 13:20

Issues in writing and reading datetime in R

1 Answers1