7

In R, I have a bunch of datetime values that I measure in GMT. I keep running into accidents where some function or another loses the timezone on my values, or even loses the class name. Even on functions so basic as c() and unlist():

> dput(x)
structure(1317830532, class = c("POSIXct", "POSIXt"), tzone = "GMT")
> dput(c(x))
structure(1317830532, class = c("POSIXct", "POSIXt"))
> dput(list(x))
list(structure(1317830532, class = c("POSIXct", "POSIXt"), tzone = "GMT"))
> dput(unlist(list(x)))
1317830532

I feel like I'm a hair's breadth away from having a real Mars Climate Orbiter moment if this happens when I least expect it. Anyone have any strategies for making sure their dates "stay put"?

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • 1
    See also [c(a, b) for POSIXct objects with tzone attributes?](https://stat.ethz.ch/pipermail/r-help//2012-July/317759.html) – Henrik May 25 '18 at 20:55

3 Answers3

6

This behaviour is documented in ?c, ?DateTimeClasses and ?unlist:

From ?DateTimeClasses:

Using c on "POSIXlt" objects converts them to the current time zone, and on "POSIXct" objects drops any "tzone" attributes (even if they are all marked with the same time zone).*

From ?c:

c is sometimes used for its side effect of removing attributes except names.*


That said, my testing indicates that the integrity of your data remains intact, despite using c or unlist. For example:

x <- structure(1317830532, class = c("POSIXct", "POSIXt"), 
                 tzone = "GMT")
y <- structure(1317830532+3600, class = c("POSIXct", "POSIXt"), 
                 tzone = "PST8PDT")
x
[1] "2011-10-05 16:02:12 GMT"

y
[1] "2011-10-05 10:02:12 PDT"

strftime(c(x, y), format="%Y/%m/%d %H:%M:%S", tz="GMT")
[1] "2011/10/05 16:02:12" "2011/10/05 17:02:12"

strftime(c(x, y), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")
[1] "2011/10/05 09:02:12" "2011/10/05 10:02:12"

strftime(unlist(y), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")
[1] "2011/10/05 10:02:12"

Your Mars Rover should be OK if you use R to keep track of dates.

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • I think my rover is in jeopardy actually - because I'm looking at things like the time of day that events happened (by doing `x-floor_date(x,'day')`, for example), and if time zones are silently removed, those numbers end up wrong. – Ken Williams Oct 05 '11 at 19:35
  • One more example: `strftime(unlist(list(y)), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")` Error in as.POSIXlt.numeric(x, tz = tz) : 'origin' must be supplied – Ken Williams Oct 05 '11 at 19:41
  • In short, I do understand that this is the documented behavior, I just think it's extremely error prone and unlikely to really be helpful. If I want to convert to local time, I wouldn't just call `c()`, I'd make my code more explicit anyway. – Ken Williams Oct 05 '11 at 19:44
  • @KenWilliams My understanding is not that `c` converts a time to local time. Yes, it strips the original `tz`, but the actual time remains unchanged. What happens is that the implicit conversion to local time happens in your later calculations. I can see how that will lead to a bookkeeping problem if you later want to know what the local time was and you no longer have a record of the tz. I'm sorry, but can't think of an easy fix for this. – Andrie Oct 06 '11 at 14:08
  • Right - more precisely, `c` removes any timezone attribute, then later, various other functions will pick a default timezone based on my environment. The instant relative to GMT remains the same, but the time-of-day shifts around. – Ken Williams Nov 07 '11 at 15:42
4

Why not set your timezone to GMT for your R sessions, then? If something gets converted to the "current" timezone, it is still right.

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • 1
    Yeah, I'm considering doing that. But it's not like I only ever work with data from one time zone, and I don't like this "action at a distance" that my environment is having on my data. – Ken Williams Oct 05 '11 at 19:39
  • 1
    You could just set it in the relevant sessions (script) from within R with `Sys.setenv(TZ="GMT")` – Brian Diggs Oct 05 '11 at 21:40
2

Given that this is documented behavior and one should either avoid such functions or else defensively code around such behavior, then you need mechanisms to support either approach. For things like this, I would recommend writing a "poor man's lint"; with such a lint detector, you can go about restoring sanity In addition, to lint detection, there are several approaches to avoiding Mars Polar Orbiter crashes, some are independent of each other, others dependent:

  1. Set a policy & build alternatives First, for all of the functions that you know are causing you problems, either decide that you won't use them, or write a new wrapper function that will behave as intended, and that will set the timezone parameter you desire. Then, ensure that you use that special wrapper rather than the underlying function.
  2. Static analysis Write a search function using your favorite editor (e.g. as a macro), using a shell script & the GNU find and grep functions, or in some other manner (e.g. grep in R), to find those particular functions that are causing you problems. When found, either remove or use a defensive coding method (e.g. the wrapper in #1).
  3. Testing Using unit tests, e.g. Runit or testthat, develop tests that ensure that timezone properties are maintained when using your functions or package. Every time there's a new bug, create a new test to ensure that bug doesn't appear again in released versions.
  4. Weak type checking You can also include tests throughout your code that test whether a timezone is specified. It's best to have your own function for this test, rather than write a block of code that is reproduced throughout. In this way, you can eventually extend the checking to include other types of checks, such as persistence of the timezone and tests for whether operations on two or more objects are mindful of differences in timezones (maybe they allow it, maybe they don't).
  5. Map everything to one TZ Also known as Indiana-be-damned. Retaining a variety of policies about the timezones is hard work, and is essentially friction in working with temporal data. Just map to one TZ (UTC) and then let anything local work from that. If you happen to have local regularity that is invariant of DST, then address that after converting back from UTC.

I do all of #s 1-4 for other issues, but, just as they're easily adapted to timezone checking, they're fairly reusable for lots of Mars Orbiter-avoiding objectives. I do this kind of thing precisely to avoid coding the next such Mars Orbiter. (That was an expensive lesson for all of us that work with numerical data. :))

Iterator
  • 20,250
  • 12
  • 75
  • 111