3

I'm trying to convert a string variable with Dutch dates into a date variable some example values of the original variable (date.string) : "18 sep. 2016", "29 mei 2014", "7 mrt. 2016" I tried:

df$date <- as.Date(df$dta.string, format = "%d %h %Y", locale = "dutch")

clearly I'm making a mistake since I only get NA's returned in my new column, some one any suggestion?

Benjamin Telkamp
  • 1,451
  • 2
  • 17
  • 31

2 Answers2

5

You could do

df <- data.frame(dta.string = c("18 sep. 2016", "29 mei 2014", "7 mrt. 2016"))
oldloc <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "dutch")
df$dta.string <- sub("([a-z]{3})(?!\\.)", "\\1.", df$dta.string, perl=TRUE)
as.Date(df$dta.string, format = "%d %h. %Y")
# [1] "2016-09-18" "2014-05-29" "2016-03-07"
Sys.setlocale("LC_TIME", oldloc)
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • macOS Sierra isn't letting me change the `LC_TIME` to dutch (i.e. I can't really test it w/o firing up a VM and I'm being lazy), but do you know if it's that locale switching that's enabling `%d` to work without the leading `0` for the day number? – hrbrmstr Oct 01 '16 at 14:54
  • Hmm using my standard locale `German_Germany.1252`, `as.Date(c("01 01 2016", "1 1 2016"), "%d %m %Y")` gives the expected result of two dates. I'm on R version 3.3.0 (2016-05-03), Platform: x86_64-w64-mingw32/x64 (64-bit), Running under: Windows 7 x64 (build 7601) Service Pack 1. – lukeA Oct 01 '16 at 14:58
  • 1
    Very interesting. Even the help on `strptime()` has: _"`%d` Day of the month as decimal number (01–31)."_ which is the same definition as the ISO/IEC 9899:1990 one `strftime()` (which it ultimately uses). I had no idea it was locale dependent. Thx for the extended example/test in that comment. – hrbrmstr Oct 01 '16 at 15:24
  • @hrbrmstr I'm not sure the leading 0 handling is locale-dependent; `as.Date('1-1-1', '%y-%m-%d')` works fine for me. I suspect it just coerces numeric tokens to numeric so it doesn't matter, but I'm not quite dedicated enough to go dig into the C code to find out. – alistaire Oct 01 '16 at 15:36
  • 2
    Argh! Yep. in `w_strptime_internal()` / `strptime_internal()` R is parsing out the numbers itself. I may poke R-devel mailing list to see if they'd take a PR for to change the `strptime` documentation. I had assumed that it just called the ISO routine directly, but they do quite a bit more processing of the input strings. – hrbrmstr Oct 01 '16 at 15:47
  • 1
    Ok, so that's only _kinda_ true. It's working when there are separators (space, "`-`", "`/`") _but_ `as.Date("4061776", "%d%m%Y")` won't work due to how R is iterating through the character buffer. With separators there's enough slack and the underlying other buffer format reading/converting R functions are tolerant enough to make it work. I'd caution against relying on the behaviour. – hrbrmstr Oct 01 '16 at 15:55
5

lubridate::dmy has a locale parameter where you can specify what locale to evaluate the strings with without changing your actual locale. It also cuts through inconsistent separators, which is handy:

lubridate::dmy(c("18 sep. 2016", "29 mei 2014", "7 mrt. 2016"), locale = 'nl_NL.UTF-8')
## [1] "2016-09-18" "2014-05-29" "2016-03-07"
alistaire
  • 42,459
  • 4
  • 77
  • 117