3

I recently came across dplyr and - as a newbie - like it very much. Hence, I try to convert some of my base-R code into dplyr-code.

Working with air traffic control data, I am struggling with coercing timestamps using lubridate and as.POSIXlt to parse timestamps embedded in a mutate_each() call. I need the POSIXlt format as I have to work with local times (at different locations) later on. Reading in the data delivers a data frame of characters. The following is a simplistic example:

ICAO_ADEP <- c("DGAA","ZSPD","UAAA","RJTT","KJFK","WSSS")
MVT_TIME_UTC <- c("01-Jan-2013 04:02:24", NA,"01-Jan-2013 04:08:18", NA,"01-Jan-2013 04:17:11","01-Jan-2013 04:21:52")
flights <- data.frame(ICAO_ADEP, MVT_TIME_UTC)

The function I wrote reads as follows:

make_POSIXlt <- function(vec, tz="UTC"){
vec <- parse_date_time(vec, orders="dmy_hms", tz=tz)
vec <- as.POSIXlt(vec, tz=tz)
}

The code works fine when executed with a single column:

flights$MVT_TIME_UTC <- make_POSIXlt(flights$MVT_TIME_UTC)

If I run the following dplyr code the function fails:

flights$BLOCK_TIME_UTC <- mutate_each(flights, funs(make_POSIXlt(.)), MVT_TIME_UTC)
Error: wrong result size (9), expected 6 or 1

The issue should be linked with the as.POSIXlt call. If this line is commented out the code works within mutate_each and coerces the timestamp into POSIXct.

Any idea/help on what is wrong? Obviously, my data has several timestamps that I would like to coerce with mutate_each (or any other suitable dplyr function) ...

bmike
  • 917
  • 2
  • 22
  • 41
Ray
  • 2,008
  • 14
  • 21
  • 4
    dplyr will use data.table by default and data.table [does not support POSIXlt](http://stackoverflow.com/questions/21487614/error-creating-r-data-table-with-date-time-posixlt). Use POSIXct instead. – hrbrmstr Dec 24 '14 at 18:53
  • Thx, hrbrmstr. Sorry for asking the obvious ... actually, I should have realised when commenting out the as.POSIXlt that something goes wrong there. Life could be so easy, if one uses his brain .. ;) – Ray Dec 24 '14 at 19:15
  • 1
    @Ray I can assure you that your question is far from being obvious – David Arenburg Dec 24 '14 at 20:19
  • 1
    @hrbrmstr I don't understand why dplyr would use data.table on a data.frame by default... Does dplyr’s data.frames support posixlt? – Arun Dec 24 '14 at 20:26
  • The development version of dplyr has been updated to a clearer error message to tell us that `mutate does not support POSIXlt results`. – aosmith Dec 24 '14 at 20:34
  • 3
    Heck, data.frame (of any variety) doesn't accept `POSIXlt`-class gracefully, either. POSIXlt is a multi-level list structure with each entry being a named list. I'm not exactly sure where it all goes wrong with data.frames, but I know that it does. Adding a warning to dplyr will put that package one-up on the help page for `data.frame` which has no such warning. – IRTFM Dec 24 '14 at 21:30
  • 1
    See also issues on `POSIXlt` on `dplyr` github [**here**](https://github.com/hadley/dplyr/issues/670), [**here**](https://github.com/hadley/dplyr/issues/813) – Henrik Dec 24 '14 at 21:36
  • Thanks to all of you for coming back on this and the useful pointers. It is also encouraging to see that the error (in this case for dplyr) will change one day :) Will work with POSIXct. Have a great holiday season! – Ray Dec 26 '14 at 11:26

1 Answers1

1

Revisiting my question about 4 years later, I realised that I forgot to mark it as answered. However, this also gives me the chance to document how this (relatively) simple type coercion can (meanwhile) elegantly solved with dplyr and lubridate.

Key lesson learned:

  1. never use POSIXlt with a data frame (and its later brother tibble, although you can now work with list columns).
  2. coerce date-timestamps with the helpful parser functions from the lubridate package.

For the example from above

ICAO_ADEP <- c("DGAA","ZSPD","UAAA","RJTT","KJFK","WSSS")
MVT_TIME_UTC <- c("01-Jan-2013 04:02:24", NA,"01-Jan-2013 04:08:18", NA,"01-Jan-2013   04:17:11","01-Jan-2013 04:21:52")
flights <- data.frame(ICAO_ADEP, MVT_TIME_UTC)

flights <- flights %>% mutate(MVT_TIME_UTC = lubridate::dmy_hms(MVT_TIME_UTC)

will coerce the timestamps in MVT_TIME_UTC. Check the documentation on lubridate for other parsers and/or how to handle local time zones.

Ray
  • 2,008
  • 14
  • 21