-1

I have a variable with dates from the year 1960 to 2000 (dd/mm/yy) which are encoded as character, and I am trying to convert them into dates using the next expression:

MUJERES$Fecha_nacimiento <- as.Date(MUJERES$Fecha_nacimiento, "%d/%m/%y")

Using this script, some years convert for example from "8/8/68" into "2060-08-08".

How can I convert this into the correct year (1960 instead of 2060)?

  • 1
    Don't use two-digit years. There's no way to determine if that's 1968 or 2068. Retrieve the data again and ensure 4 digit years are used and preferably, use YYYY-MM-DD as the format – Panagiotis Kanavos May 27 '22 at 14:40
  • If you know which years are definitely in the 19th century, you could explicitly add them as a prefix: `x <- gsub("/(..)$", "/19\\1", x); as.Date(x, "%d/%m/%Y")` – dash2 May 27 '22 at 14:41
  • 1
    This isn't nitpicking. What you posted reintroduced the Y2K bug, the actual most expensive bug in computing. There's no excuse for two-digit years in 2022. You can't just assume some cutoff year and say eg values below 40 are in the 21st century and values above are in the 20th. In 2020, before COVID, the biggest news in IT was how [Lloyd's](https://www.theregister.com/2020/01/02/lloyds_outage/) and the UK's [DVLA](https://www.theregister.com/2020/01/13/y2k_dvla/) crashed in January because their systems started treating 20 as 1920. – Panagiotis Kanavos May 27 '22 at 14:45
  • @dash2 you don't. That's the problem. No matter what, this would only be covering an embarrassing Y2K bug. – Panagiotis Kanavos May 27 '22 at 14:45
  • 2
    Sure. But maybe the OP can't control her data source, and maybe she knows from the context that 68 means 1968 not 2068. – dash2 May 27 '22 at 15:06
  • The conversion is not "wrong"; it conforms to the POSIX standards. See [this post](https://stackoverflow.com/questions/46662026/as-date-with-two-digit-years), especially the comments. The fact is that 2-digit years are ambiguous, so you have to deal with them explicitly based on your understanding of your data set. – jlhoward May 28 '22 at 05:29

1 Answers1

1

A possible (if wordy) solution is separating the strings into days, months, and years, use paste0 to add the 19 to the year, and then use lubridate's make_date to reknit them together and convert them into year format.

MUJERES <- data.frame(fecha_nacimiento = c("8/8/68", "31/12/65"))

library(dplyr)
library(tidyr)

MUJERES |>
  separate(fecha_nacimiento, into = c("d", "m", "y")) |> 
  mutate(y = if_else(y == "00", "2000", paste0("19", y)),
         fecha_nacimiento = lubridate::make_date(y, m, d)) |> 
  select(-c(d, m, y))

Output

#>   fecha_nacimiento
#> 1       1968-08-08
#> 2       1965-12-31

The usual caveat is that you can only do this if you have external information that all dates in your datasets are definitely in the 20th century!

Andrea M
  • 2,314
  • 1
  • 9
  • 27