1

I have a tibble in R with about 2,000 rows. It was imported from Excel using read_excel. One of the fields is a date field: dob. It imported as a string, and has dates in three formats:

"YYYY-MM-DD"
"DD-MM-YYYY"
"XXXXX"  (ie, a five-digit Excel-style date)

Let's say I treat the column as a vector.

dob <- c("1969-02-02", "1986-05-02", "34486", "1995-09-05", "1983-06-05",
         "1981-02-01", "30621", "01-05-1986")

I can see that I probably need a solution that uses both parse_date_time and as.Date.

If I use parse_date_time:

dob_fixed <- parse_date_time(dob, c("ymd", "dmy"))

This fixes them all, except the five-digit one, which returns NA.

I can fix the five-digit one, by using as.integer and as.Date:

dob_fixed2 <- as.Date(as.integer(dob), origin = "1899-12-30")

Ideally I would run one and then the other, but because each returns NA on the strings that don't work I can't do that.

Any suggestions for doing all? I could simply change them in Excel and re-import, but I feel like that's cheating!

akrun
  • 874,273
  • 37
  • 540
  • 662
azul
  • 13
  • 2
  • `Ideally I would run one and then the other, but because each returns NA`. You can use the `NA` as index to run the second i.e. `is.na(dob_fixed)` i.e. `i1 <- is.na(dob_fixed); dob_fixed[i1] <- as.Date(as.integer(dob[i1], origin = "1899-12-30")` – akrun Jan 20 '19 at 10:46

1 Answers1

1

We create a logical index after the first run based on the NA values and use that to index for the second run

i1 <- is.na(dob_fixed)
dob_fixed[i1] <- as.Date(as.integer(dob[i1]), origin = "1899-12-30")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This is great, thank you. It worked perfectly. Out of curiosity, is there a way to do this without creating a new vector (ie, i1)? Would that require an if statement? – azul Jan 21 '19 at 02:49
  • @azul YOu can use `replace` i.e. `replace(dob_fixed, is.na(dob_fixed), as.Date(as.integer(dob[is.na(dob_fixed)]), origin = "1899-12-30"))` – akrun Jan 21 '19 at 05:51
  • 1
    Thanks, that's really helpful. – azul Jan 22 '19 at 12:03