5

I have a set of data from across the US that I am trying to convert into local time for each "subject". I have UTC timestamps on each event and have converted those into POSIXct format, but every time I try to include a vector of tz = DS$Factor or tz = as.character(DS$Factor) in any of the POSIXct/POSIXlt functions (including format() and strftime()) I get an error that says:

Error in as.POSIXlt.POSIXct(x, tz = tz) : invalid 'tz' value

If I just enter tz = 'US/Eastern' it works fine, but of course not all of my values are from that time zone.

How do I get the time stamps into local time for each "subject"?

The DS$Factor has 5 values: US/Arizona US/Central US/Eastern US/Mountain US/Pacific

Thanks, Shorthand

Sam
  • 15,336
  • 25
  • 85
  • 148
Shorthand
  • 176
  • 6

3 Answers3

2

Bringing in dplyr and lubridate, I wound up doing something like:

require(lubridate)
require(dplyr)

df = data.frame(timestring = c("2015-12-12 13:34:56", "2015-12-14 16:23:32"),
                localzone = c("America/Los_Angeles", "America/New_York"), stringsAsFactors = F)

df$moment = as.POSIXct(df$timestring, format="%Y-%m-%d %H:%M:%S", tz="UTC")

df = df %>% rowwise() %>% mutate(localtime = force_tz(moment, localzone))

df
schnee
  • 1,050
  • 2
  • 9
  • 20
  • It seems like the issue is that `as.POSIXct` doesn't like to take in a vector, it only takes a string. So using `rowwise()` works without have the use the `force_tz` workaround. So this works: `df %>% rowwise() %>% mutate(moment = as.POSIXct(timestring, format="%Y-%m-%d %H:%M:%S", tz=localzone) %>% ungroup()` – Lloyd Christmas May 04 '20 at 14:11
  • The approach of `rowwise` is very very slow. I recommend checking out these solutions, they have much faster: https://community.rstudio.com/t/working-with-timezones-in-lubridate/4260/6 – Lloyd Christmas May 04 '20 at 15:15
1

Actually, what I did was to loop through the timezones instead of the number of rows in the data set ... then its much, much faster. I'll post code tomorrow.

In general, that's a lesson for R: don't loop through the big data frame, loop through the (much shorter) vector of categories and apply using the which() function.

As there are only 5 time zones, the loop only takes a few seconds now.

One other caveat is that if you put it into POSIXct format it will still graph the times in your machine's local timezone. So you need an extra step to then covert it into local time using force_tz().

cap$tdiff is really just created to make sure that the code is doing what it says it should be doing.

library("lubridate")    

tzs <- as.character(unique(cap$timezone))

cap$localtimes <- as.POSIXlt(0,origin = "1970-01-01")

#now loop through by timezone instead of lines of cap[]
for (i in 1:length(tzs)) {
  whichrows <- which(cap$timezone == tzs[i])

  cap[whichrows,"localtimes"] <-
    with_tz(cap[whichrows,"UTC"],tzone = tzs[i])
}

remove(i, whichrows)

cap$tdiff <- as.numeric((force_tz(cap$localtime, "UTC") - cap$UTC))
cap$localtime <- as.POSIXct(force_tz(cap$localtimes))
Shorthand
  • 176
  • 6
  • I would add that this is one of the few times I still do a loop in R ... you could also do a some king of group_by(timezone) %>% group_split() %>% map_dfr(function(df_i){df_i$timestamp <- with_tz(df_i$timestamp, tzone = df_i$tzn[1]}) – Shorthand Nov 13 '19 at 18:44
0

So I was able to create a for loop to do this, but it is slow, taking about 10 minutes to run. I couldn't figure out an apply() sytnax, and would certainly appreciate some help creating a faster, more parallelizable way of doing this operation as the datastore has 768k observations and growing.

>     require(lubridate)
>     
>     loct = NULL for (i in 1:nrow(DS))
>     {
>       loct[i] <- with_tz(DS$UTC[i],tzone =
>       ifelse(DS$timezone[i]=="","US/Eastern",as.character(DS$timezone[i])))
>     } DS$localtime <- as.POSIXct(loct, origin ="1970-01-01") remove (loct, i)
Shorthand
  • 176
  • 6
  • I need to do the same thing. The best I've come up with is using a `for` loop as well. There has to be a better way. – josiekre Nov 22 '15 at 22:25