2

I have a folder with many files (read via list.files and lapply) that use a mix of two and four digit years. Dates in the 80s and 90s are two digits and dates in the 2000s are four digits (but these are mixed throughout each file, so I can't regex the file name).

Is there a preferred way to handle this? I have the following ad hoc solution.

vec1 <- c("06/30/97", "12/31/99", "01/01/2000", "05/25/2001")
vec2 <- as.POSIXlt(as.Date(vec1, "%m/%d/%Y"))
vec3 <- vec2
vec3$year <- ifelse(vec3$year < 100, vec3$year + 1900, vec3$year)

This seems particularly janky. These cases work, but will this necessarily do the correct %y to %Y adjustment? I am afraid that this will silently fail due to leap years and the like. Thanks!

Richard Herron
  • 9,760
  • 12
  • 69
  • 116

2 Answers2

4

We can modify my answer to a previous question to adapt to this more "ambiguous" case:

multidate <- function(data, formats){
    a<-list()
    for(i in 1:length(formats)){
        a[[i]]<- as.Date(data,format=formats[i])
        a[[i]][a[[i]]>Sys.Date() | a[[i]]<as.Date("1000-01-01")]<-NA
        a[[1]][!is.na(a[[i]])]<-a[[i]][!is.na(a[[i]])]
        }
    a[[1]]
    }

multidate(vec1, c("%m/%d/%Y","%m/%d/%y"))
[1] "1997-06-30" "1999-12-31" "2000-01-01" "2001-05-25"
#or
multidate(vec1, c("%m/%d/%y","%m/%d/%Y"))
[1] "1997-06-30" "1999-12-31" "2000-01-01" "2001-05-25"

As long as you don't have a date in the future it works. If you do, change Sys.Date() by any other future date.

Community
  • 1
  • 1
plannapus
  • 18,529
  • 4
  • 72
  • 94
3

If you know that you just have to add "19" before the dates which only have 2-digits year, you can also do it with gsub :

vec1 <- c("06/30/97", "12/31/99", "01/01/2000", "05/25/2001")
gsub("(.*)/(..)$", "\\1/19\\2", vec1)
# [1] "06/30/1997" "12/31/1999" "01/01/2000" "05/25/2001
juba
  • 47,631
  • 14
  • 113
  • 118
  • Thanks, Juba. It's always obvious in hindsight. :) (And I can find violations easily... all data are in 80s, 90s, and 2000s.) – Richard Herron Feb 15 '13 at 13:27