I am working with .csv data that was exported from Teradata. Several columns were originally timestamps with timezones, so after loading the .csv in R I'd like to convert these columns (which are loaded as strings) to POSIXlt or POSIXct. I am using strptime
, but the format of the timezone from the .csv file does not match what strptime
is expecting. For example, it expects -0400
but the .csv has the format -04:00
where a colon separates the hours and minutes.
I can remove the colon, but this is an extra step and complication I'd like to avoid if possible. Is there a way to tell strptime
to use a different format for the timezone (%z
)?
Here is an example:
## Example data:
x <- c("2011-10-12 22:17:13.860746-04:00", "2011-10-12 22:17:13.860746+00:00")
format <- "%Y-%m-%d %H:%M:%OS%z"
## Doesn't work:
strptime(x,format)
## [1] NA NA
## Ignores the timezone:
as.POSIXct(x)
## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 22:17:13 EDT"
## Remove the last colon:
x2 <- gsub("(.*):", "\\1", x)
x2
## [1] "2011-10-12 22:17:13.860746-0400" "2011-10-12 22:17:13.860746+0000"
## This works, but requires extra processing (removing the colon)
strptime(x2,format)
## [1] "2011-10-12 22:17:13" "2011-10-12 18:17:13"
So I'm looking to achieve this last result using something like strptime(x,"%Y-%m-%d %H:%M:%OS%zz")
, where %zz
is a custom expression for the timezone that recognizes the -04:00
format. Or %zH:%zM
might be even better.
If this isn't possible, does anyone have a slick/flexible function for converting strings (of various formats) to dates for multiple columns of a data.frame/data.table?