1

I have a vector of dates which I imported from excel and it comes in a very weird format. Some of them come as char as dd/mm/yyyy and some come as a char as, for instance, 45265 which is the number corresponding to this date in excel.

I want to apply a function to convert this vector to proper dates in R. The solution I tried returns an error and I cannot understand it.

t1=c("14/02/2020", "17/02/2020", "18/02/2020", "19/02/2020", "20/02/2020", 
     "21/02/2020", "26/02/2020", "27/02/2020", "28/02/2020", "43864", 
     "43893", "43924", "43954", "43985", "44077")
lapply(t1,function(x) ifelse(grepl("/",x),dmy(x),as.Date(as.numeric(x),origin='1900-01-01')))

1 Answers1

2

Two things:

  1. most of what you want to do can be done as a vector, no need to apply;
  2. ifelse is class-unsafe, trying to use it with Date-class (or POSIXt-class), for example, will strip the class and return numbers. See How to prevent ifelse() from turning Date objects into numeric objects.

I suggest this as an alternative:

out <- rep(as.Date(NA), length(t1))
out[grepl("/", t1)] <- as.Date(t1[grepl("/", t1)], format = "%d/%m/%Y")
out[is.na(out)] <- as.Date(as.numeric(t1[is.na(out)]), origin = "1900-01-01")
out
#  [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
#  [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"

If you have more candidate formats, you might consider https://stackoverflow.com/a/52319606/3358272 and https://stackoverflow.com/a/70304571/3358272, which iterates over possible formats (in a similar way) and attempts to convert them all until completion (or exhaustion).

An alternative to using base::ifelse (which strips class) is to use either dplyr::if_else or data.table::fifelse, which might be simpler if you are using either package for other uses. Note that they will run both methods on all of t1, so you will get warnings (both implementations).

if_else(grepl("/", t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = "1900-01-01"))
# WARN [2023-05-17 09:54:01] {"msg":"uncaught warning","warning":" 6 failed to parse.","where":["ccbr()","if_else(grepl(\"/\", t1), lubridat","lubridate::dmy(t1)"],"pid":"39316"}
# WARN [2023-05-17 09:54:01] {"msg":"uncaught warning","warning":"NAs introduced by coercion","where":["ccbr()","if_else(grepl(\"/\", t1), lubridat","as.Date(as.numeric(t1), origin ="],"pid":"39316"}
#  [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
#  [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"
data.table::fifelse(grepl("/", t1), lubridate::dmy(t1), as.Date(as.numeric(t1), origin = "1900-01-01"))
# WARN [2023-05-17 09:54:11] {"msg":"uncaught warning","warning":" 6 failed to parse.","where":["ccbr()","data.table::fifelse(grepl(\"/\", t","lubridate::dmy(t1)"],"pid":"39316"}
# WARN [2023-05-17 09:54:11] {"msg":"uncaught warning","warning":"NAs introduced by coercion","where":["ccbr()","data.table::fifelse(grepl(\"/\", t","as.Date(as.numeric(t1), origin ="],"pid":"39316"}
#  [1] "2020-02-14" "2020-02-17" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21" "2020-02-26" "2020-02-27"
#  [9] "2020-02-28" "2020-02-05" "2020-03-05" "2020-04-05" "2020-05-05" "2020-06-05" "2020-09-05"

This can be suppressed by wrapping the whole if_else/fifelse with suppressWarnings.

r2evans
  • 141,215
  • 6
  • 77
  • 149