Since you mentioned fasttime
, I got curious about testing it against a few other common options. It requires dates in the year-month-day format, but you can do this with some regex.
I did benchmark testing on a smaller, but still cumbersome, set of dates. I'm on a year-old MacBook Pro with a ton of other stuff running, and still doing 100 trials on 1 million dates with all 3 methods finished before I was done eating a sandwich.
set.seed(9)
days <- sample(1:30, 1e6, replace = TRUE)
date_str <- sprintf("09/%02d/2019", days)
# as.Date(date_str, format = "%m/%d/%Y")
# lubridate::mdy(date_str)
# fasttime::fastPOSIXct(gsub("^(\\d{2})/(\\d{2})/(\\d{4})", "\\3-\\1-\\2", date_str))
bench <- microbenchmark::microbenchmark(
list = list(
base = as.Date(date_str, format = "%m/%d/%Y"),
lubr = lubridate::mdy(date_str),
fast = fasttime::fastPOSIXct(gsub("^(\\d{2})/(\\d{2})/(\\d{4})", "\\3-\\1-\\2", date_str))
)
)
bench
#> Unit: nanoseconds
#> expr min lq mean median uq max neval cld
#> base 3 5 7.02 5 6 180 100 a
#> lubr 4 5 6.91 6 6 148 100 a
#> fast 4 5 8.77 5 6 332 100 a
Based on the mean and lowest maximum, lubridate::mdy
runs the fastest without having to do any reformatting or specifying a format string. Based on median, the base as.Date
run fastest but requires you setting the formatting string (not a big deal), or fasttime
but with the regex stipulation. Make of that what you will.
I'd also note that fasttime
converts to POSIX, so since there isn't a time element set it tacks one on—removing it might then become another time-consuming step.