0

I'm trying to create a function that will automatically determine the date format of a column in a dataframe and apply the correct as.Date() function. Typically, the dates come in "%Y-%m-%d" or "%m/%d/%y" (this will change based on if the .csv has been opened and saved in Excel).

Initially, I thought an "if/else" statement would work, and came up with the following:

if(nchar(df$date[[1]] == 10)){
    df$Date <- as.Date(df$Date)
} else {
    df$Date <- as.Date(df$Date, format = "%m/%d/%y"

But it throws a "character string is not in a standard unambiguous format" error.

Here's a sample data frame to work with:

a <- seq(1:10)
dates1 <- c("3/21/16", "3/22/16", "3/23/16", "3/24/16", "3/25/16", "3/26/16", "3/27/16", "3/28/16", "3/29/16", "3/30/16")
dates2 <- c("2016-03-21", "2016-03-22", "2016-03-23", "2016-03-24", "2016-03-25", "2016-03-26", "2016-03-27", "2016-03-28", "2016-03-29", "2016-03-30")

df <- data.frame(a, dates1, dates2)

df$dates1 <- as.character(df$dates1)
df$dates2 <- as.character(df$dates2)

The if/else statement should be able to work on "dates1" and "dates2", but as you can see, it only works with "dates2"

if(nchar(df$dates1[[1]] == 10)){
      df$dates1 <- as.Date(df$dates1)
} else {
      df$dates1 <- as.Date(df$dates1, format = "%m/%d/%y")
}

if(nchar(df$dates2[[1]] == 10)){
      df$dates2 <- as.Date(df$dates2)
} else {
      df$dates2 <- as.Date(df$dates2, format = "%m/%d/%y")
}

Apologies in advance for any formatting issues.

Michael
  • 81
  • 9
  • 1
    You have to `dput()` the data for us to be able to solve this – Hack-R Dec 02 '16 at 20:02
  • This should answer your concern. It is exactly same, as what wrong you are doing . Solutions to overcome this also provided by @dirk..............................http://stackoverflow.com/questions/14755425/what-are-the-standard-unambiguous-date-formats – user5249203 Dec 02 '16 at 20:07
  • I think I can guess, but I want you to give a reproducible example. Your basic problem is that `as.Date` is vectorized by default; you either have to step through the elements of `df$Date` one at a time, or identify which ones are in which format and convert them separately ... – Ben Bolker Dec 02 '16 at 20:19
  • Just added a dataframe to reproduce the example. Apologies for not having it in the first place. And thanks for the help!!! – Michael Dec 02 '16 at 20:27

1 Answers1

1

You can use the lubridate package for this. For example:

date_object = mdy_hms('12-31-16 13:42:15')
date_object = mdy_hms('12/31/16 13-42-15')

all work nicely. lubridate will try and guess what the date is based on your rough estimate.

You can even provide a list of formats, and it will pick the appropriate one:

parse_date_time(c('12-31-16', '31-12-16'), c('mdy', 'dmy'))

this is especially handy when the date format changes halfway down a long dataset.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149