Vectorised time zone conversion with lubridate

Question

I have a data frame with a column of date-time strings:

library(tidyverse)
library(lubridate)

testdf = data_frame(
  mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
  mydt = c('2018-01-17T09:15:00', '2018-01-17T09:16:00', '2018-01-17T09:18:00'))

testdf

#  A tibble: 3 x 2
#   mytz               mydt
#   <chr>              <chr>
# 1 Australia/Sydney   2018-01-17T09:15:00
# 2 Australia/Adelaide 2018-01-17T09:16:00
# 3 Australia/Perth    2018-01-17T09:18:00

I want to convert these date-time strings to POSIX date-time objects with their respective timezones:

testdf %>% mutate(mydt_new = ymd_hms(mydt, tz = mytz))

Error in mutate_impl(.data, dots) : Evaluation error: tz argument must be a single character string. In addition: Warning message: In if (tz != "UTC") { : the condition has length > 1 and only the first element will be used

I get the same result if I use ymd_hms without a timezone and pipe it into force_tz. Is it fair to conclude that lubridate doesn't support any sort of vectorisation when it comes to timezone operations?

Perhaps `testdf %>% rowwise %>% mutate(mydt_new = ymd_hms(mydt, tz = mytz))` ? — jazzurro, Jan 17 '18 at 00:34
That works! I don't understand why `rowwise` would be required, though... — jimjamslam, Jan 17 '18 at 00:35
I suppose if it's operating on groups, and each group has one row, then tz is only length 1 in each call. That would work, even if I had multiple rows withthe same time zone. Thanks! — jimjamslam, Jan 17 '18 at 00:36
Interestingly, the tibble output from this prints the new column in `Australia/Perth`, even though I'm in `Australia/Sydney`. I wonder if that display is arbitrary. — jimjamslam, Jan 17 '18 at 00:44

score 5 · Accepted Answer · answered Jan 17 '18 at 03:22

5

Another option is map2. It may be better to store different tz output in a list as this may get coerced to a single tz

library(tidyverse)
out <- testdf %>%
         mutate(mydt_new = map2(mydt, mytz, ~ymd_hms(.x, tz = .y)))

If required, it can be unnested

out %>%
   unnest

The values in the list are

out %>%
   pull(mydt_new)
#[[1]]
#[1] "2018-01-17 09:15:00 AEDT"

#[[2]]
#[1] "2018-01-17 09:16:00 ACDT"

#[[3]]
#[1] "2018-01-17 09:18:00 AWST"

answered Jan 17 '18 at 03:22

akrun

874,273
37
540
662

Ahh—so keep my datetimes as a list column instead? I hadn't thought about that! – jimjamslam Jan 17 '18 at 03:23
1

@rensa It is better because the column allows only a single timezone and coercing to a single one may be different. – akrun Jan 17 '18 at 03:24
In my particular use case right now, coercing to a single timezone isn't problematic, but I agree that this is more generally preferable :) – jimjamslam Jan 17 '18 at 03:25
1

@rensa What is interesting is that if I use the `unnest` i get a different value compared to the jazzuro's one. So, it would be coercing to different `tz` – akrun Jan 17 '18 at 03:26
Although `rowwise` and `ungroup` are a good solution, I think this is probably the preferable in order to keep the time zone info as it is. – jimjamslam Jan 17 '18 at 03:27
@rensa I think jazzurro's is also correct as you don't really mind with a single timezone. His answer coerces to ` "Australia/Perth"` which is the last entry while mine coerces to `UTC` after `unnest` i.e. `out %>% unnest %>% pull(mydt_new) %>% tz(.)` – akrun Jan 17 '18 at 03:28
1

If I were able I'd accept both your answers But I think that, as a general case, it's a good idea to prioritise the predictability of the output. – jimjamslam Jan 17 '18 at 03:29
@akrun I lately answered [this question](https://stackoverflow.com/questions/59833660/convert-to-local-time-zone-from-latitude-and-longitude-r/59842152#59842152), which reminded me of this question. I have been wondering it is safe and/or better to keep time zones away from time. In the linked post, I created time zone as a new column and converted time to each time zone. Even after `unnest` I see correct time. I want to know how you usually handle time. Any advice? – jazzurro Jan 22 '20 at 09:49
@jazzurro Good to hear from you. I guess you are standardizing the time zone based on 'GMT', right? If that is the case, I would do the same thing to get the time zone individually – akrun Jan 22 '20 at 21:23
1

@akrun Thanks for your reply. I am glad to hear that you would do the same. I'll stick to the approach. By the way, you are still rocking here. That's cool. I still have lots to learn from you. – jazzurro Jan 23 '20 at 02:33
2

@akrun I think you are the main contributor for my reputation. Much appreciated. :) – jazzurro Jan 24 '20 at 06:34

score 3 · Answer 2 · answered Jan 17 '18 at 00:46

3

tz argument must be a single character string. indicates that there are more than one time zones thrown into ymd_hms(). In order to make sure that there is only one time zone being thrown into the function, I used rowwise(). Note that I am not in Australian time zone. So I am not sure if the outcome I have is identical to yours.

testdf <- data_frame(mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
                     mydt = c('2018-01-17 09:15:00', '2018-01-17 09:16:00', '2018-01-17 09:18:00'))

testdf %>% 
rowwise %>% 
mutate(mydt_new = ymd_hms(mydt, tz = mytz))

  mytz               mydt                mydt_new           
  <chr>              <chr>               <dttm>             
1 Australia/Sydney   2018-01-17 09:15:00 2018-01-17 06:15:00
2 Australia/Adelaide 2018-01-17 09:16:00 2018-01-17 06:46:00
3 Australia/Perth    2018-01-17 09:18:00 2018-01-17 09:18:00

answered Jan 17 '18 at 00:46

jazzurro

23,179
35
66
76

Yep, I get the same output. It also appears that the tibble printing method chooses a timezone from those in the column, rather than the user's local time zone. – jimjamslam Jan 17 '18 at 00:47
@rensa I see. That is something good to know. Thanks for the information. :) – jazzurro Jan 17 '18 at 00:49
@jazzurro I posted as a solution as it was interesting comparison about how the timezones are coerced. Hope you don't mind. – akrun Jan 17 '18 at 03:31
@akrun I realized you were askign to check the code you have in the deleted comment. I just could not reply to you at that moment. Sorry. I think you investigated an interesting case. I have something to ask more in this point, but I cannot commit myself here right now. Could I contact you here later? – jazzurro Jan 17 '18 at 03:37
@jazzurro sure, I thought it would give the correct answer, but the timezone change happens earlier. So, wrapping with `list` didn't help and so i deleted that code – akrun Jan 17 '18 at 03:38
@akrun Finally, I am settled. Seeing the difference, I first wondered how my code worked. I thought I was creating a date object with a time zone (e.g., the 1st row with Sydney). Then I realized that the results were displayed with the time zone for Perth. Did the code initially produce date objects in accordance with the time zones? I also had another question, but this requires more typing. Whenever you have time, would you be able to chat? – jazzurro Jan 17 '18 at 12:38
@jazzurro Sure, right now I have a call within 10 minutes. Talk to you later I think the rowwise is somehow coercing it to a single timezone otherwise, wrapping with `list` would have saved it – akrun Jan 17 '18 at 12:40
@akrun Got it. Let me know when you find a bit of spare time. – jazzurro Jan 17 '18 at 12:42
@jazzurro If you do the `as.character` the type will be different, but it should get the similar values `testdf %>% rowwise %>% mutate(mydt_new = as.character(ymd_hms(mydt, tz = mytz)))` – akrun Jan 17 '18 at 12:46
1

@akrun If I followed your idea and did the following. `testdf %>% rowwise %>% mutate(mydt_new = format(ymd_hms(mydt, tz = mytz), usetz = TRUE))`. – jazzurro Jan 17 '18 at 14:34
@akrun Yeah, I think so too. As long as the result stay in list, we cannot see what is there. So having this type of information is a good thing. By the way, given some time difference, it may be hard to have a chat. Let me briefly explain the question I have. Just like the OP here, I had some moments that I received error message in mutate(). I cannot come up with any specific case since I do not remember any now. In these cases rowwise was the solution. Have you come across this kind of moment? I think the key thing that I am missing is when rowwise is required. – jazzurro Jan 18 '18 at 01:36
@akrun I am not particularly surprised to see why the OP got confused since he was expecting two elements in each row to be passed to ymd_hms(). But that was not the case. I wish I can come up with an example, but this is all I can tell right now. – jazzurro Jan 18 '18 at 01:39
@jazzurro I think the `map` functions could be faster compared to `rowwise` (if I am not mistaken). In this specific case, if it is not kept in a list, the timezone would collapse. But, let's say if we have a rowwise operation and the return values have the same type etc. then one of the map extensions like `map_lgl`, map_dbl etc could be useful. – akrun Jan 18 '18 at 02:11
@akrun Yeah I think rowwise is slow (or slower). Can we use `Map()` as well? Yesterday, I had no issue with `testdf %>% mutate(new.time = Map(function(x, y) ymd_hms(x, tz = y), mytz, mydt)`. But today I get warning messages. – jazzurro Jan 18 '18 at 03:01
@akrun My bad. the order of x and y argument was the other way around. Now what is the advantage of using map2 here compared to the classic Map? – jazzurro Jan 18 '18 at 03:09
@jazzurro I guess the purrr functions are consistent in the order of function arguments. I haven't checked whether there is any speed advantage or not. But I am assuming that these functions are optimized – akrun Jan 18 '18 at 05:42
@akrun I see. Thank you for that info. Going back to what I mentioned ealier, what did the OP's code did not work? Obviously, there was more than one time zones getting in to the tz argument, but why was that the case? – jazzurro Jan 18 '18 at 08:28
@jazzurro The reason would be `ymd_hms` takes a single `tz` and is not vectorized for tz. As there are multiple `tz` it breaks up. If you do `testdf %>% mutate(mydt_new = ymd_hms(mydt, tz = mytz[1]))` it works. The error is not related to `mutate` i.e. if you do it outside the chain `ymd_hms(testdf$mydt, tz = testdf$mytz)# Error in C_force_tz(time, tz = tzone, roll) : `tz` argument must be a single character string – akrun Jan 18 '18 at 08:41
@akrun Got it. So if you see similar behavior in mutate, you would want to suspect a similar reason staying behind error message? – jazzurro Jan 18 '18 at 08:45
@jazzurro There are certain class types the `mutate` would give error. For e.g. `POSIXlt` class – akrun Jan 18 '18 at 08:46

Vectorised time zone conversion with lubridate

2 Answers2

Linked