1

I need to parse characters in ISO8601 with R. The format for the characters is the following:

%Y-%m-%dT%H:%M:%S%z

One example that causes problems is the following:

2000-01-02T13:00:00.000+13:00

I am able to remove the : in the UTC offset, but this results in nothing useful.

I am using rstudio 1.2.1335 running in a docker container build from rocker/geospatial:latest via this Dockerfile. The environment of R is:

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] parsedate_1.2.0  sos4R_0.4.0.9002 stringr_1.4.0    httr_1.4.0       webmockr_0.3.4   testthat_2.1.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         compiler_3.6.0     pillar_1.4.0       remotes_2.0.4      prettyunits_1.0.2  tools_3.6.0        digest_0.6.18     
 [8] packrat_0.5.0      pkgbuild_1.0.3     uuid_0.1-2         pkgload_1.0.2      memoise_1.1.0      tibble_2.1.1       lattice_0.20-38   
[15] anytime_0.3.3      pkgconfig_2.0.2    rlang_0.3.4        cli_1.1.0          rstudioapi_0.10    withr_2.1.2        xml2_1.2.0        
[22] fs_1.3.1           fauxpas_0.2.0      desc_1.2.0         devtools_2.0.2     rprojroot_1.3-2    grid_3.6.0         glue_1.3.1        
[29] R6_2.4.0           processx_3.3.1     sessioninfo_1.1.1  sp_1.3-1           callr_3.2.0        magrittr_1.5       rematch2_2.0.1    
[36] usethis_1.5.0      ps_1.3.0           backports_1.1.4    assertthat_0.2.1   RApiDatetime_0.0.4 stringi_1.4.3      crayon_1.3.4

I tried using lubridate, parsedate, anytime but none of them seem to be able to parse the timezone correctly.

In the end, I tried the solution presented here at stack overflow with the following result:

> timestring <- "2000-01-02T13:00:00.000+13:00"
> fmt <- "%Y-%m-%dT%H:%M:%S%z"
> cleanedTimestring <- gsub("(.*).(..)$","\\1\\2",timestring)
> parsedTime <- strptime(cleanedTimestring, fmt, tz = "UTC")
> str(parsedTime)
 POSIXlt[1:1], format: NA

When changing the UTC offset to 12:00 and removing the milliseconds .000, a nearly useful result is created:

> timestring <- "2000-01-02T13:00:00+12:00"
> cleanedTimestring <- gsub("(.*).(..)$","\\1\\2",timestring)
> parsedTime <- strptime(cleanedTimestring, fmt, tz = "UTC")
> str(parsedTime)
 POSIXlt[1:1], format: "2000-01-02 01:00:00"
> parsedTime
[1] "2000-01-02 01:00:00 UTC"

But this is too much preprocessing and I am loosing information because of the missing milliseconds.

I expect to be able to parse a string like 2000-01-02T13:00:00.000+13:00 to a POSIXct that results in the following:

[1] "2000-01-02 13:00:00.000 NZDT"

or

[1] "2000-01-02T13:00:00.000+13:00"

Is there any lib that is able to do this?

Remember, that I do not know beforehand the timezone of the timestamps. They might come as UTC offsets or id (e.g. +13:00|+1300 vs. NZDT.

Do you need any additional details?

Eike
  • 324
  • 2
  • 10
  • 2
    I would give a sample vector of a few values you need the same code to parse and give the desired output for each of the inputs. That way if our code gives the correct answers for all those values, we can assume we've solved the problem. – MrFlick May 21 '19 at 15:36
  • 1
    Perhaps it's a bug in `strptime` that it does not accept ISO8601 in this example. @MrFlick, do you agree? [Apparently](https://en.wikipedia.org/wiki/List_of_UTC_time_offsets) there are at least three zones that have offsets greater than 12. This disagrees with the [international date line](https://en.wikipedia.org/wiki/International_Date_Line) showing no zones outside of +/- 12. Sigh, why do time zones/offsets have to be so problematic? – r2evans May 21 '19 at 15:48

1 Answers1

2

If you just need to accomidate both milliseconds and not milliseconds, then you can do

timestring <- c(
  "2000-01-02T13:00:00.000+13:00",
  "2000-01-02T13:00:00+12:00"
)
formats <- c("%Y-%m-%dT%H:%M:%S%z", "%Y-%m-%dT%H:%M:%OS%z")
lubridate::parse_date_time(timestring, c("%Y-%m-%dT%H:%M:%S%z", "%Y-%m-%dT%H:%M:%OS%z"), exact=TRUE)
# [1] "2000-01-02 00:00:00 UTC" "2000-01-02 01:00:00 UTC"
MrFlick
  • 195,160
  • 17
  • 277
  • 295