1

I have tons of .xls files with dates that im reading into R. The issue I'm having is that every person filled the "date" column in a different way. Therefore, I have things like:

date <- c(1995, 1995-05-03, 03-05-1995, 1995/5)

I've been trying to find a way to fix this:

  1. by using as.Date()
  2. by using convertToDate()

but this generates, of course, multiple NAs.

Is there a way for fixing this?

duckmayr
  • 16,303
  • 3
  • 35
  • 53

2 Answers2

2

You could try anytime package, supposing you could add quotation marks to your data:

library(anytime)
date <- anydate(date)
head(date)
#> [1] "1995-01-01" "1995-05-03" "1995-03-05" "1995-05-01"

Created on 2020-07-27 by the reprex package (v0.3.0)

bttomio
  • 2,206
  • 1
  • 6
  • 17
1

You can try lubridate::parse_date_time(), which allows you to specify multiple date and time formats that can occur in your data (and you don't need to keep the order of formats):

date <- c("1995", "1995-05-03", "03-05-1995", "1995/5", "1996.12.01", "1.3.1993")

lubridate::parse_date_time(date, order = c("Y", "Ymd", "dmY", "Y/m"))
#> [1] "1995-01-01 UTC" "1995-05-03 UTC" "1995-05-03 UTC" "1995-05-01 UTC"
#> [5] "1996-12-01 UTC" "1993-03-01 UTC"