I am working on a software that performs some time series manipulations. I have recently discovered a serious issue on R script side that I had developed; the unexpected behaviour was isolated on a specific machine which had Europe/Moscow
locale. The issue boils down to the following snippet:
strange_days <- c("2/1/1984", "3/1/1984", "4/1/1984", "5/1/1984", "6/1/1984")
Sys.setenv(TZ='Europe/Moscow')
d <- strptime(strange_days, '%m/%d/%Y')
d
[1] "1984-02-01 MSK" "1984-03-01 MSK" "1984-04-01" "1984-05-01 MSD" "1984-06-01 MSD"
Everything seems to be correctly recognized. I thought that since this is daily data, time zone attribute is not making much difference; painful mistake:
as.numeric(d)
[1] 444430800 446936400 NA 452203200 454881600
which obviously fails afterwards during conversion to an xts
object.
The current fix is to force all timezones to GMT via strptime(strange_days, '%m/%d/%Y', tz='GMT')
or even Sys.setenv(TZ='GMT')
; the issue is gone with that.
Is it a good practice? Will the code be reliable in all situations? What techniques would you recommend to make avoid similar problems?
And what's so particular took place on the 1st of April 1984?
Edit: this and this questions are indicating this is probably a daylight saving that causes the problem.
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.1.0
Edit 2: issue is clearly Windows-specific, not reproduced on linux with these specs:
R version 3.1.0 (2014-04-10)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.1.0