I know this is a long-standing, deeply embedded issue, but it's something I come up against so regularly, and that I see beginners to R
struggle with so regularly, that I'd love to have a satisfactory solution. My google and SO searches have come up empty so far, but please point me in the right direction if this is duplicated elsewhere.
TL;DR: Is there a way to use something like the POSIXct
class without a timezone? I generally use tz="UTC"
regardless of the actual timezone of the dataset, but it's a messy hack IMO, and I don't particularly like it. What I want is something like tz=NULL
, which would behave the same way as UTC, but without actually adding "UTC" as a tzone
attribute.
The problem
I'll start with an example (there are plenty) of typical timezone issues. Creating an object with POSIXct
values:
df <- data.frame( timestamp = as.POSIXct( c( "2018-01-01 03:00:00",
"2018-01-01 12:00:00" ) ),
a = 1:2 )
df
# timestamp a
# 1 2018-01-01 03:00:00 1
# 2 2018-01-01 12:00:00 2
That's all fine, but then I try to convert the timestamps to dates:
df$date <- as.Date( df$timestamp )
df
# timestamp a date
# 1 2018-01-01 03:00:00 1 2017-12-31
# 2 2018-01-01 12:00:00 2 2018-01-01
The dates have converted incorrectly, because my computer locale is in Australian Eastern Time, meaning that the numeric values of the timestamps have been shifted by the offset relevant to my locale (in this case -11hrs). We can see this by forcing the timezone to UTC, then comparing the values before and after:
df$timestamp[1]
# [1] "2018-01-01 03:00:00 AEDT"
x <- lubridate::force_tz( df$timestamp[1], "UTC" ); x
# [1] "2018-01-01 03:00:00 UTC"
difftime( df$timestamp[1], x )
# Time difference of -11 hours
That's just one example of the issues cause by timezones. There are others, but I won't go into them here.
My hack-y solution
I don't want that behaviour, so I need to convince as.POSIXct
not to mess with my timestamps. I generally do this by using tz="UTC"
, which works fine, except that I'm adding information to the data that isn't real. These times are NOT in UTC, I'm just saying that to avoid time-shift issues. It's a hack, and any time I give my data to someone else, they could be forgiven for thinking that the timestamps are in UTC when they're not. To avoid this, I generally add the actual timezone to the object/column name, and hope that anyone I pass my data on to will understand why someone would label an object with a timezone different to the one in the object itself:
df <- data.frame( timestamp.AET = as.POSIXct( c( "2018-01-01 03:00:00",
"2018-01-01 12:00:00" ),
tz = "UTC" ),
a = 1:2 )
df$date <- as.Date( df$timestamp )
df
# timestamp.AET a date
# 1 2018-01-01 03:00:00 1 2018-01-01
# 2 2018-01-01 12:00:00 2 2018-01-01
What I'm hoping for
What I really want is a way to use POSIXct
without having to specify a timezone. I don't want the times messed with in any way. Do everything as though the values were in UTC, and leave any timezone details like offsets, daylight savings, etc to the user. Just don't pretend they actually ARE in UTC. Here's my ideal:
x <- as.POSIXct( "2018-01-01 03:00:00" ); x
# [1] "2018-01-01 03:00:00"
attr( x, "tzone" )
# [1] NULL
shifted <- lubridate::force_tz( x, "UTC" )
shifted == x
# [1] TRUE
as.numeric( shifted ) == as.numeric( x )
# [1] TRUE
as.Date( x )
# [1] "2018-01-01"
So there's no timezone attribute on the object at all. The date conversion works as one would expect from the printed value. If there are daylight savings time-shifts, or any other locale-specific issues, the user (me or someone else) needs to deal with that themselves.
I believe something similar to this is possible in POSIXlt
, but I really don't want to shift to that. chron
or another timeseries-oriented package might be another solution, but I think POSIXct
is more widely used and accepted, and this seems like something that should be possible within base::
. A POSIXct
object with tz="UTC"
is exactly what I need, I just don't want to have to lie about timezones in order to get it to behave the way I want (and I believe most beginners to R
expect).
So what do others do here? Is there an easy way to use POSIXct
without a timezone that I've missed? Is there a better work-around than tz="UTC"
? Is that what others are doing?