7

I have found a very annoying problem that I want to share with the community. This is a question that I have found an acceptable solution for (detailed below), but I now have several follow-up questions. My knowledge of time stamps and POSIX variables is limited, particularity how plyr, dplyr, and readr handle these.

When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time.

I am downloading data from an API and preserving the time stamp. Each time I grab data, I bind it to an existing file and save the file. My timezone is MDT, and I am requesting data using MDT time, which I am then trying to bind to a file in UTC time, and the times don't match...it gets messy and frustrating. In essence the beautiful time stamp database I am trying to create is turning into a pile of garbage.

To remedy this problem, I converted the POSIX time column to character column using:

df.time <- as.character(df.time)

This allowed me to save the files in a time zone consistent with the time stamps being returned to me by the API.

This leads me to the following series of questions:

  1. Is there a program that can join POSIX variables across time zones? For instance, if its noon MDT, its 6pm UTC. Could I join two dataframes based on these time stamps without having to convert them to the same time zone first?
  2. Is it possible to prevent write_csv from changing POSIX variables to UTC?
  3. Is there a csv write function that doesn't change POSIX variables?

EDIT: I have included some example data of what I am talking about:

> df1 <- as.data.frame(fromJSON("https://api.pro.coinbase.com/products/BTC-USD/candles?start=2018-07-23&12:57:00?stop=2018-07-23&19:34:58granularity=300"))
> colnames(df1) <- c("time", "low", "high", "open", "close", "volume")
> df1$time <- anytime(df1$time)
> df1Sort <- df1[order(df1$time),]
> head(df1Sort, 5)
                   time     low    high    open   close    volume
299 2018-07-23 16:13:00 7747.00 7747.01 7747.01 7747.01 9.2029168
298 2018-07-23 16:14:00 7743.17 7747.01 7747.00 7747.01 7.0205668
297 2018-07-23 16:15:00 7745.47 7745.73 7745.67 7745.73 0.9075707
296 2018-07-23 16:16:00 7745.72 7745.73 7745.72 7745.73 4.6715157
295 2018-07-23 16:17:00 7745.72 7745.73 7745.72 7745.72 2.4921921
> write_csv(df1Sort, "df1Sort.csv", col_names = TRUE)
> df2 <- read_csv("df1Sort.csv", col_names = TRUE)
Parsed with column specification:
cols(
  time = col_datetime(format = ""),
  low = col_double(),
  high = col_double(),
  open = col_double(),
  close = col_double(),
  volume = col_double()
)
> head(df2, 5)
# A tibble: 5 x 6
  time                  low  high  open close volume
  <dttm>              <dbl> <dbl> <dbl> <dbl>  <dbl>
1 2018-07-23 22:13:00  7747  7747  7747  7747  9.20 
2 2018-07-23 22:14:00  7743  7747  7747  7747  7.02 
3 2018-07-23 22:15:00  7745  7746  7746  7746  0.908
4 2018-07-23 22:16:00  7746  7746  7746  7746  4.67 
5 2018-07-23 22:17:00  7746  7746  7746  7746  2.49 
Phil_T
  • 942
  • 9
  • 27
  • Can you share some examples of what kind of data you have and what it looks like? In terms of writing, a quick (and not so elegant) fix would be to write the time stamp as a character, with the corresponding time zone and offset information. I've used `fread` from `data.table` without any problems. Would be easier to help if we can see the data you're working with. – Gautam Jul 24 '18 at 02:02
  • *"When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time."* I don't think that's true. I've just checked, and `write_csv` stores `POSIXct` in my system-specific timezone. There is definitely no conversion to "UTC". – Maurits Evers Jul 24 '18 at 02:02
  • 1
    I think there might be a different issue here; for example `as.Date(x)` for `x` a `POSIXct` object sets `tz = "UTC"` by default; so if you use `as.Date` on a `POSIXct` object and you *don't* explicitly match time-zones, times will be converted to UTC. This has nothing to do with `write_csv` though. – Maurits Evers Jul 24 '18 at 02:07
  • Maurits, this is almost what is going on. I do not use as.Data() anywhere in my code, but I noticed that my data is class POSIX, and then I drop the timezone, its still listed as POSIX, and when I save it and reopen, it is returned as UTC time. I'll update my post. – Phil_T Jul 24 '18 at 03:02
  • 1
    @MauritsEvers Really? Do you get a different result on the reproducible example in my answer? (Assuming you're not in the GMT+3 Addis Ababa timezone...) I wouldn't be too surprised if it is OS-specific - I'm on Windows currently. – Gregor Thomas Jul 24 '18 at 03:05
  • @Gregor, I am on Ubuntu 16. – Phil_T Jul 24 '18 at 03:16
  • 1
    @Gregor Ok, I've just checked; I'm on AEST, and I can reproduce your example: `read_csv` stores `t` in UTC. I'm not sure what I did earlier. I did check, but must've done something wrong. Sorry for the confusion! – Maurits Evers Jul 24 '18 at 03:18
  • Its weird, itsnt it? I get this on multiple APIs, not just the coinbase one that I showed as an example. – Phil_T Jul 24 '18 at 03:19

2 Answers2

5
  1. "Is there a program that can join POSIX variables across time zones... without having to convert them to the same time zone first?"

    Maybe? But if so, they're almost certainly just converting to UTC under the hood and just hiding it from you. I'm unaware of any thing like this in R. (data.table being the only package I'm aware of that can join on anything other than exact equality, and it doesn't have this feature.) If I were you, I'd just convert everything to one timezone - probably UTC.

    For more reading for best practices this SQL-focused answer seems very good.

  2. "Is it possible to prevent write_csv from changing POSIX variables to UTC?"

    Not built-in. The ?write_csv documentation is pretty clear: It doesn't list any options for this and does say "POSIXct's are formatted as ISO8601."

  3. "Is there a csv write function that doesn't change POSIX variables?"

    Sure, the built-in write.csv doesn't change to UTC (I think it uses system settings), and data.table::fwrite offers quite a few options. If you want to control how your dates are saved, I think your best bet is to convert them to character in whatever format you want, and then any of the writing functions should handle them just fine. You should check out the ?data.table::fwrite documentation, it's got good info. They warn that the "write.csv" option can be quite slow.


You should include reproducible examples with your questions. Here's one for this:

t = as.POSIXct("2018-01-01 01:30:00", tz = "Africa/Addis_Ababa")
t
# [1] "2018-01-01 01:30:00 EAT"

d = data.frame(t)

library(readr)
write_csv(d, "tz_test.csv")
system("head tz_test.csv")
# 2017-12-31T22:30:00Z

library(data.table)
fwrite(d, "tz_test_dt.csv", dateTimeAs = "write.csv")
system("head tz_test_dt.csv")
# t
# 2018-01-01 01:30:00

write.csv(d, "tz_test_base.csv")
system("head tz_test_base.csv")
# "","t"
# "1",2018-01-01 01:30:00
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

It looks like you're using libraries from the tidyverse; have you had a look at the lubridate library?

The help file for as_date() may help you here convert a date-time variable to your desired timezone before you append/join your data.

For example:

> dt_utc <- ymd_hms("2010-08-03 00:50:50")
> dt_utc
[1] "2010-08-03 00:50:50 UTC"

> as_datetime(dt_utc, tz = "Australia/Melbourne")
[1] "2010-08-03 10:50:50 AEST"
  • I appreciate the answer and library recommendation. I don't have issues switching back forth between timezones or character and posix. Rather, I has hoping for a solution to avoid having to do these transformations. – Phil_T Jul 24 '18 at 03:22