6

I have a data frame with a column of date-time strings:

library(tidyverse)
library(lubridate)

testdf = data_frame(
  mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
  mydt = c('2018-01-17T09:15:00', '2018-01-17T09:16:00', '2018-01-17T09:18:00'))

testdf

#  A tibble: 3 x 2
#   mytz               mydt
#   <chr>              <chr>
# 1 Australia/Sydney   2018-01-17T09:15:00
# 2 Australia/Adelaide 2018-01-17T09:16:00
# 3 Australia/Perth    2018-01-17T09:18:00

I want to convert these date-time strings to POSIX date-time objects with their respective timezones:

testdf %>% mutate(mydt_new = ymd_hms(mydt, tz = mytz))

Error in mutate_impl(.data, dots) : Evaluation error: tz argument must be a single character string. In addition: Warning message: In if (tz != "UTC") { : the condition has length > 1 and only the first element will be used

I get the same result if I use ymd_hms without a timezone and pipe it into force_tz. Is it fair to conclude that lubridate doesn't support any sort of vectorisation when it comes to timezone operations?

jimjamslam
  • 1,988
  • 1
  • 18
  • 32

2 Answers2

5

Another option is map2. It may be better to store different tz output in a list as this may get coerced to a single tz

library(tidyverse)
out <- testdf %>%
         mutate(mydt_new = map2(mydt, mytz, ~ymd_hms(.x, tz = .y)))

If required, it can be unnested

out %>%
   unnest

The values in the list are

out %>%
   pull(mydt_new)
#[[1]]
#[1] "2018-01-17 09:15:00 AEDT"

#[[2]]
#[1] "2018-01-17 09:16:00 ACDT"

#[[3]]
#[1] "2018-01-17 09:18:00 AWST"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Ahh—so keep my datetimes as a list column instead? I hadn't thought about that! – jimjamslam Jan 17 '18 at 03:23
  • 1
    @rensa It is better because the column allows only a single timezone and coercing to a single one may be different. – akrun Jan 17 '18 at 03:24
  • In my particular use case right now, coercing to a single timezone isn't problematic, but I agree that this is more generally preferable :) – jimjamslam Jan 17 '18 at 03:25
  • 1
    @rensa What is interesting is that if I use the `unnest` i get a different value compared to the jazzuro's one. So, it would be coercing to different `tz` – akrun Jan 17 '18 at 03:26
  • Although `rowwise` and `ungroup` are a good solution, I think this is probably the preferable in order to keep the time zone info as it is. – jimjamslam Jan 17 '18 at 03:27
  • @rensa I think jazzurro's is also correct as you don't really mind with a single timezone. His answer coerces to ` "Australia/Perth"` which is the last entry while mine coerces to `UTC` after `unnest` i.e. `out %>% unnest %>% pull(mydt_new) %>% tz(.)` – akrun Jan 17 '18 at 03:28
  • 1
    If I were able I'd accept both your answers But I think that, as a general case, it's a good idea to prioritise the predictability of the output. – jimjamslam Jan 17 '18 at 03:29
  • @akrun I lately answered [this question](https://stackoverflow.com/questions/59833660/convert-to-local-time-zone-from-latitude-and-longitude-r/59842152#59842152), which reminded me of this question. I have been wondering it is safe and/or better to keep time zones away from time. In the linked post, I created time zone as a new column and converted time to each time zone. Even after `unnest` I see correct time. I want to know how you usually handle time. Any advice? – jazzurro Jan 22 '20 at 09:49
  • @jazzurro Good to hear from you. I guess you are standardizing the time zone based on 'GMT', right? If that is the case, I would do the same thing to get the time zone individually – akrun Jan 22 '20 at 21:23
  • 1
    @akrun Thanks for your reply. I am glad to hear that you would do the same. I'll stick to the approach. By the way, you are still rocking here. That's cool. I still have lots to learn from you. – jazzurro Jan 23 '20 at 02:33
  • 2
    @akrun I think you are the main contributor for my reputation. Much appreciated. :) – jazzurro Jan 24 '20 at 06:34
3

tz argument must be a single character string. indicates that there are more than one time zones thrown into ymd_hms(). In order to make sure that there is only one time zone being thrown into the function, I used rowwise(). Note that I am not in Australian time zone. So I am not sure if the outcome I have is identical to yours.

testdf <- data_frame(mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
                     mydt = c('2018-01-17 09:15:00', '2018-01-17 09:16:00', '2018-01-17 09:18:00'))

testdf %>% 
rowwise %>% 
mutate(mydt_new = ymd_hms(mydt, tz = mytz))

  mytz               mydt                mydt_new           
  <chr>              <chr>               <dttm>             
1 Australia/Sydney   2018-01-17 09:15:00 2018-01-17 06:15:00
2 Australia/Adelaide 2018-01-17 09:16:00 2018-01-17 06:46:00
3 Australia/Perth    2018-01-17 09:18:00 2018-01-17 09:18:00
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • Yep, I get the same output. It also appears that the tibble printing method chooses a timezone from those in the column, rather than the user's local time zone. – jimjamslam Jan 17 '18 at 00:47
  • @rensa I see. That is something good to know. Thanks for the information. :) – jazzurro Jan 17 '18 at 00:49
  • @jazzurro I posted as a solution as it was interesting comparison about how the timezones are coerced. Hope you don't mind. – akrun Jan 17 '18 at 03:31
  • @akrun I realized you were askign to check the code you have in the deleted comment. I just could not reply to you at that moment. Sorry. I think you investigated an interesting case. I have something to ask more in this point, but I cannot commit myself here right now. Could I contact you here later? – jazzurro Jan 17 '18 at 03:37
  • @jazzurro sure, I thought it would give the correct answer, but the timezone change happens earlier. So, wrapping with `list` didn't help and so i deleted that code – akrun Jan 17 '18 at 03:38
  • @akrun Finally, I am settled. Seeing the difference, I first wondered how my code worked. I thought I was creating a date object with a time zone (e.g., the 1st row with Sydney). Then I realized that the results were displayed with the time zone for Perth. Did the code initially produce date objects in accordance with the time zones? I also had another question, but this requires more typing. Whenever you have time, would you be able to chat? – jazzurro Jan 17 '18 at 12:38
  • @jazzurro Sure, right now I have a call within 10 minutes. Talk to you later I think the rowwise is somehow coercing it to a single timezone otherwise, wrapping with `list` would have saved it – akrun Jan 17 '18 at 12:40
  • @akrun Got it. Let me know when you find a bit of spare time. – jazzurro Jan 17 '18 at 12:42
  • @jazzurro If you do the `as.character` the type will be different, but it should get the similar values `testdf %>% rowwise %>% mutate(mydt_new = as.character(ymd_hms(mydt, tz = mytz)))` – akrun Jan 17 '18 at 12:46
  • 1
    @akrun If I followed your idea and did the following. `testdf %>% rowwise %>% mutate(mydt_new = format(ymd_hms(mydt, tz = mytz), usetz = TRUE))`. – jazzurro Jan 17 '18 at 14:34
  • @akrun Yeah, I think so too. As long as the result stay in list, we cannot see what is there. So having this type of information is a good thing. By the way, given some time difference, it may be hard to have a chat. Let me briefly explain the question I have. Just like the OP here, I had some moments that I received error message in mutate(). I cannot come up with any specific case since I do not remember any now. In these cases rowwise was the solution. Have you come across this kind of moment? I think the key thing that I am missing is when rowwise is required. – jazzurro Jan 18 '18 at 01:36
  • @akrun I am not particularly surprised to see why the OP got confused since he was expecting two elements in each row to be passed to ymd_hms(). But that was not the case. I wish I can come up with an example, but this is all I can tell right now. – jazzurro Jan 18 '18 at 01:39
  • @jazzurro I think the `map` functions could be faster compared to `rowwise` (if I am not mistaken). In this specific case, if it is not kept in a list, the timezone would collapse. But, let's say if we have a rowwise operation and the return values have the same type etc. then one of the map extensions like `map_lgl`, map_dbl etc could be useful. – akrun Jan 18 '18 at 02:11
  • @akrun Yeah I think rowwise is slow (or slower). Can we use `Map()` as well? Yesterday, I had no issue with `testdf %>% mutate(new.time = Map(function(x, y) ymd_hms(x, tz = y), mytz, mydt)`. But today I get warning messages. – jazzurro Jan 18 '18 at 03:01
  • @akrun My bad. the order of x and y argument was the other way around. Now what is the advantage of using map2 here compared to the classic Map? – jazzurro Jan 18 '18 at 03:09
  • @jazzurro I guess the purrr functions are consistent in the order of function arguments. I haven't checked whether there is any speed advantage or not. But I am assuming that these functions are optimized – akrun Jan 18 '18 at 05:42
  • @akrun I see. Thank you for that info. Going back to what I mentioned ealier, what did the OP's code did not work? Obviously, there was more than one time zones getting in to the tz argument, but why was that the case? – jazzurro Jan 18 '18 at 08:28
  • @jazzurro The reason would be `ymd_hms` takes a single `tz` and is not vectorized for tz. As there are multiple `tz` it breaks up. If you do `testdf %>% mutate(mydt_new = ymd_hms(mydt, tz = mytz[1]))` it works. The error is not related to `mutate` i.e. if you do it outside the chain `ymd_hms(testdf$mydt, tz = testdf$mytz)# Error in C_force_tz(time, tz = tzone, roll) : `tz` argument must be a single character string – akrun Jan 18 '18 at 08:41
  • @akrun Got it. So if you see similar behavior in mutate, you would want to suspect a similar reason staying behind error message? – jazzurro Jan 18 '18 at 08:45
  • @jazzurro There are certain class types the `mutate` would give error. For e.g. `POSIXlt` class – akrun Jan 18 '18 at 08:46