2
Date  
01/01/2013 #mm/dd/yyyy  
12122015 #mmddyyyy  
2014-03-21 #yyyy-mm-dd  
95.10.12 #yy.mm.dd  

I have a column "Date" with different formats. How can I clean this and convert them into a single date format?
Additional info:class(Date) is factor.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • We can use `library(lubridate)` and `?guess_formats` – akrun Aug 12 '15 at 12:37
  • You can use `foo(parse_date_time(Date, guess_formats(Date, c('dmy', 'ymd'))))` where `foo` is from the link. and `Date <- c('01/01/2013', '12122015', '2014-03-21', '95.10.12')` – akrun Aug 12 '15 at 12:47
  • 1
    And also in cases like "03-03-2013"..how can I differentiate between month and date(since both are same) when cleansing the data? – Ramprakash V Aug 12 '15 at 13:09
  • You may have to separate out those special cases. – akrun Aug 12 '15 at 13:11
  • Almost 50-60% of my data set has values like I have mentioned. So how can I clean it efficiently?? – Ramprakash V Aug 12 '15 at 13:13
  • Okay, I reopened the question. – akrun Aug 12 '15 at 13:13
  • It may be better if you post with all the possible cases in your dataset – akrun Aug 12 '15 at 13:22
  • These 5 are the possible cases. But day and month are not clear since they are in single digits. This kind of data fills most of my data – Ramprakash V Aug 12 '15 at 13:43
  • I have a "heuristic" function which guesses over all common formats, but even it cannot decide mm-dd vs dd-mm: see https://github.com/eddelbuettel/rcppbdt/blob/master/demo/toPOSIXct.R – Dirk Eddelbuettel Aug 12 '15 at 13:46

1 Answers1

3

The easiest way to do this is to use the lubridate package.

Date=c( 
   "01/01/2013" #mm/dd/yyyy  
  ,"12122015" #mmddyyyy  
  ,"2014-03-21" #yyyy-mm-dd  
  ,"95.10.12" #yy.mm.dd  
)

library(lubridate)

# list of functions in lubridate that 
# translate text to POSIXct: you only need to know the
# order of the data (e.g. mdy = month-day-year).
funs <- c("mdy","dym","ymd","dmy","myd","ydm")

# vector to store results
dates <- as.POSIXct(rep(NA,length(Date)))

# we try everything lubridate has. There will be some warnings
# e.g. because mdy cannot translate everything. You can ignore this.
for ( f in funs ){
  dates[is.na(dates)] <- do.call(f,list(Date[is.na(dates)]))  
}
dates

> dates
[1] "2013-01-01 01:00:00 CET" "2015-12-12 01:00:00 CET" "2013-01-01 01:00:00 CET" "2015-12-12 01:00:00 CET"