2

I'm trying to convert a character column with dates to date format. However, the dates are in an ambiguous format. Some entries are of the format %d.%m.%Y (e.g., "03.02.2021"), while others are %d %b %Y (e.g., "3 Feb 2021").

I've tried as.Date(tryFormats=c("%d %b %Y", "%d.%m.%Y")), but realized that tryFormats is only flexible for the first entry, so that the entries of type %d %b %Y are correctly identified but those of %d.%m.%Y become NAs, or vice versa. I've also tried the anytime package, but that produced NAs in a similar fashion.

I've made sure that the column doesn't contain any NAs or empty strings, and I don't receive any error message.

Koechi
  • 25
  • 4
  • 5
    Are these the only two formats? Then try something like `y <- as.Date(x, format = "%d.%m.%Y"); y[is.na(y)] <- as.Date(x[is.na(y)], format ="%d %b %Y")`. – Roland Sep 16 '21 at 12:09
  • 3
    In addition to Roland's suggestion, if you also have dates with the same format but with different separators (e.g., "13/3/2020" and "13-3-2020") you may want to fix that using `gsub` or `stringr::str_replace` – Claudio Sep 16 '21 at 12:26

4 Answers4

1

Try the parsedate package :

df <-read.table(header=TRUE,text=
"d
03.02.2021
'3 Feb 2021'
13/3/2021
13-3-2020")

df %>% mutate(date=parsedate::parse_date(d))
##           d       date
##1 03.02.2021 2021-02-03
##2 3 Feb 2021 2021-02-03
##3  13/3/2021 2021-03-13
##4  13-3-2020 2020-03-13
Nicolas2
  • 2,170
  • 1
  • 6
  • 15
1

Similar (but expanded) to Roland's suggestion, my answer here (in the (2) section) suggests a way to deal with multiple candidate formats.

## sample data
x <- c("03.02.2021", "3 Feb 2021")

formats <- c("%d.%m.%Y", "%d %b %Y")
dates <- as.Date(rep(NA, length(x)))
for (fmt in formats) {
  nas <- is.na(dates)
  dates[nas] <- as.Date(x[nas], format=fmt)
}
dates
# [1] "2021-02-03" "2021-02-03"

It is better to have the most-frequent format first in the formats vector. One could add a quick-escape to the loop if there are many formats, such as

for (fmt in formats) {
  nas <- is.na(dates)
  if (!any(nas)) break
  dates[nas] <- as.Date(x[nas], format=fmt)
}

but I suspect that it really won't be very beneficial unless both formats and x are rather large (I have no sizes in mind to quantify "large").

r2evans
  • 141,215
  • 6
  • 77
  • 149
1

did you try lubridate ?

df <-read.table(header=TRUE,text=
                  "d
03.02.2021
'3 Feb 2021'
13/3/2021
13-3-2020")

dmy(df$d)

[1] "2021-02-03" "2021-02-03" "2021-03-13" "2020-03-13"
denis
  • 5,580
  • 1
  • 13
  • 40
1

Using anydate

library(anytime)
addFormats(c("%d/m/%Y", '%d-%m-%Y') )
anydate(df$d)
[1] "2021-02-03" "2021-02-03" "2021-03-13" "2020-03-13"
akrun
  • 874,273
  • 37
  • 540
  • 662