1

I'm confused about the behavior of Hadley's "rbind.fill" function. I have a list of data frames I would like to do a simple rbind operation on, but the rbind.fill function is giving me results that I cannot explain. Note that the "rbind" function does give me the output I expect. Here is the minimal example:

library(reshape)      
data1 <- structure(list(DATE = structure(c(1277859600, 1277856000), class = c("POSIXct", 
                   "POSIXt"), tzone = "GMT"), BACK = c(0, -1)), .Names = c("DATE", 
                    "BACK"), row.names = 1:2, class = "data.frame")
data2 <- structure(list(DATE = structure(c(1277856000, 1277852400), class = c("POSIXct", 
                   "POSIXt"), tzone = "GMT"), BACK = c(0, -1)), .Names = c("DATE", 
                    "BACK"), row.names = 1:2, class = "data.frame")
bind1 <- rbind.fill(list(data1, data2))
bind2 <- rbind(data1, data2)
data1
data2
bind1
bind2
                 DATE BACK
1 2010-06-30 01:00:00    0
2 2010-06-30 00:00:00   -1
                 DATE BACK
1 2010-06-30 00:00:00    0
2 2010-06-29 23:00:00   -1
                 DATE BACK
1 2010-06-29 18:00:00    0
2 2010-06-29 17:00:00   -1
3 2010-06-29 17:00:00    0
4 2010-06-29 16:00:00   -1
                 DATE BACK
1 2010-06-30 01:00:00    0
2 2010-06-30 00:00:00   -1
3 2010-06-30 00:00:00    0
4 2010-06-29 23:00:00   -1

So as you can see, bind1 which contains the rbind.fill output creates new times in the DATE column that were not even in the original dataset. Is this expected behavior? I am aware that I can simply use
bind <- do.call(rbind, list(data1, data2))
to bind the 5000 + dataframes I have, but can anyone speak to the aforementioned behavior?
Thank you.

Edit:
As @DWin pointed out below, this was not a problem with the rbind.fill function itself, but the fact that in the output the times were being printed in Pacific time, but were in GMT format.

SessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] tcltk     grid      stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
[1] tcltk2_1.1-5  reshape_0.8.4 plyr_1.4      proto_0.3-9.1

loaded via a namespace (and not attached):
[1] ggplot2_0.8.9 tools_2.12.1 
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Rguy
  • 1,622
  • 1
  • 15
  • 20
  • 1
    I can't replicate this behavior. What's your `sessionInfo()`? – Joshua Ulrich May 23 '11 at 17:39
  • It seems, as DWin pointed out (and as you seem to be expecting) this was a timezone problem and is only indirectly related to the rbind.fill function itself (I assume it uses "print.POSIXct" somewhere within?) – Rguy May 23 '11 at 18:22

1 Answers1

2

Most likely what you are seeing is the behavior of print.POSIXct interacting with timezone settings on your machine. I get exactly the same output for the two function calls.

> rbind.fill(list(data1,data2)) == rbind(data1,data2)
  DATE BACK
1 TRUE TRUE
2 TRUE TRUE
3 TRUE TRUE
4 TRUE TRUE
> identical( rbind.fill(list(data1,data2)) ,  rbind(data1,data2) )
[1] TRUE

I'm reasonably sure that POSIXct times are by default in GMT. Note that as.POSIXt has a tz argument:

tz   A timezone specification to be used for the conversion, if one is required. 
     System-specific (see time zones), but "" is the current timezone, and "GMT" is 
     UTC (Universal Time, Coordinated).

If you type ?locales , you will see the functions to get and set locale settings although these vary from OS to OS, so my experience on a Mac may not match yours on a different OS. I try to use Date class rather than POSIX classes, but that is just because I have no particular need for the added time level detail. There are additional functions in the chron and lubridate packages that you may want to examine.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Yes, that seems to be the problem. Thank you for your response. The data is in GMT format, and the conversion to PST (my current timezone) works produces those times in the output. I guess I need to be more conscious of such things instead of just trusting the magic boxes. Any advice as to how I can avoid such problems? – Rguy May 23 '11 at 18:20