2

I have a character vector of dates in format DDMMYYY (millenium character ommitted), that I have to convert in a vector of dates.

dates <- c("0410988", "2305009", "1111964", "0204015", "1803015", "0709015","0401015", "2012015", "3004158", "1205015")

These are the expected output dates:

2009-05-23, 1964-11-11, 2015-04-02, 2015-03-18, 015-09-07, 2015-01-04, 2015-12-20, 2158-04-30, 2015-05-12

I tried removing the first Y character and using the regular as.Date() with format= %d%m%y:

dates <- c("0410988", "2305009", "1111964", "0204015", "1803015", "0709015","0401015", "2012015", "3004158", "1205015")%\>%

sapply(dates, function(x) paste0(substr(x, 1, 4), substr(x, 6, nchar(x)))) %\>%

as.Date(., format = "%d%m%y")

But this clearly doesn't work: 1111964 gets converted to 2064-11-11 instead of 1964-11-11, and 3004158 gets converted to 2058-04-30 instead of 2158-04-30 (this date is ambiguous as it is).

I also tried using substring()to extract the characters representing the day, the month and the year separately, and then plugging them into make_date(). However, this doesn't work either with only 3 numbers per year (here is just the example how 1111964 would work):

make_date("964", "11", "11")

[1] "964-11-11"

I can't just add 1000 to the year, since it won't work for years after 2000, so I assume there has to be a better way for such conversion.

Henrik
  • 65,555
  • 14
  • 143
  • 159
  • Your initial string cleaning from three-digit year to two-digit (`y`) can be simplified to `paste0(substr(x, 1, 4), substring(x, 6))`. Then resolve the century part: [Add correct century to dates with year provided as "Year without century", %y](https://stackoverflow.com/questions/9508747/add-correct-century-to-dates-with-year-provided-as-year-without-century-y). – Henrik May 05 '23 at 11:35
  • [Convert two-digit years to four-digit years with correct century](https://stackoverflow.com/questions/12323693/convert-two-digit-years-to-four-digit-years-with-correct-century) – Henrik May 05 '23 at 11:48
  • Thank you @Henrik! I had troubles with cleaning from 3-digit to 2-digit as well, I knew it wasn't an optimal solution – the_mushroom_council May 05 '23 at 12:51

2 Answers2

1

A base R alternative to the 2-year answers in the comments is to use ifelse to determine if there is a "9" or "0" in the 5th position, then use gsub to enter the millennium and convert to a four year date:

as.Date(ifelse(substr(dates, 5,5) == "9", 
               gsub('^([0-9]{4})', '\\11', dates),
               gsub('^([0-9]{4})', '\\12', dates)),
        format = "%d%m%Y")

Output:

# [1] "1988-10-04" "2009-05-23" "1964-11-11" "2015-04-02" "2015-03-18"
# [6] "2015-09-07" "2015-01-04" "2015-12-20" "2158-04-30" "2015-05-12"
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • This is a good approach +1. Unfortunately it seems not quite correct. Look at element 9 . I think it should be `1958-04-30` and not `2158-04-30` – TarJae May 05 '23 at 14:19
  • 1
    Great catch - thanks! Before correcting, element 9 seems to break the logic from the OP (if the OP states only the millennium character is missing, what millennium should this be in?). Perhaps this was a type-o in the sample data - @the_mushroom_council, can you clarify? thanks again TarJae – jpsmith May 05 '23 at 14:26
1

Here is an alternative approach:

library(dplyr)
library(lubridate)
my_func <- function(x){
  value <- substr(x, 5,5)
  x <- case_when(value == "1" | value == "9" ~ paste0(substring(x, 1, 4), "19", substring(x, 6)),
                 value == "0"  ~ paste0(substring(x, 1, 4), "20", substring(x, 6)),
                 TRUE ~ NA_character_)
  x <- gsub("(.{2})(.{2})(.{2})", "\\1-\\2-\\3", x)
  x <- dmy(x)
  return(x)
}

my_func(dates)

 [1] "1988-10-04" "2009-05-23"
 [3] "1964-11-11" "2015-04-02"
 [5] "2015-03-18" "2015-09-07"
 [7] "2015-01-04" "2015-12-20"
 [9] "1958-04-30" "2015-05-12"
TarJae
  • 72,363
  • 6
  • 19
  • 66