3

I have a csv file which has date in following format.

8/13/2016
8/13/2016
8/13/2016
2016-08-13T08:26:04Z
2016-08-13T14:30:23Z
8/13/2016
8/13/2016

When I import this into R it takes it as a character. I want to convert it into Date format,but when I convert it into date format it takes all NA values

as.Date(df$create_date,format="%m%d%y")

Date field in CSV has different formats in which date is recorded. How can I convert it into date format in R

Neil
  • 7,937
  • 22
  • 87
  • 145
  • You have multiple format in the dataset. You can try with `parse_date_time` from `lubridate` which can take multiple formats – akrun Aug 21 '16 at 04:58
  • @akrun I have used parse_date_time,but it returns NA value for ` 2016-08-13T08:26:04Z – Neil Aug 21 '16 at 05:07
  • It should look like `lubridate::parse_date_time(df$V1, c('mdy', 'ymd_hms'))` – alistaire Aug 21 '16 at 05:08
  • 1
    Also relevant http://stackoverflow.com/questions/13764514/how-to-change-multiple-date-formats-in-same-column also this http://stackoverflow.com/questions/28564260/converting-multiple-date-formats-in-a-column-to-a-single-form and this too http://stackoverflow.com/questions/25463523/convert-variable-with-mixed-date-formats-to-one-format-in-r and this http://stackoverflow.com/questions/34389139/format-multiple-date-formats-in-one-columns-using-lubridate and this https://www.google.com – David Arenburg Aug 21 '16 at 07:25

1 Answers1

6

A base R option (assuming that there are only two formats in the OP's 'create_date' column), will be to create a logical index with grepl for those date elements that start with 'year', subset the 'create_date' based on the logical index ('i1'), convert to Date class separately and assign those separately to a Date vector of the same length as the number of rows of the dataset to create the full Date class.

i1 <- grepl("^[0-9]{4}", df$create_date)
v1 <- as.Date(df$create_date[i1])
v2 <- as.Date(df$create_date[!i1],  "%m/%d/%Y")
v3 <- Sys.Date() + 0:(nrow(df)-1)   
v3[i1] <- v1
v3[!i1] <- v2
df$create_date <- v3

Or as I commented in the OP's post (first) parse_date_time from lubridate can be used

library(lubridate)
as.Date(parse_date_time(df$create_date, c('mdy', 'ymd_hms')))
#[1] "2016-08-13" "2016-08-13" "2016-08-13" "2016-08-13" 
#[5] "2016-08-13" "2016-08-13" "2016-08-13"

data

df <- structure(list(create_date = c("8/13/2016", "8/13/2016", 
"8/13/2016", 
"2016-08-13T08:26:04Z", "2016-08-13T14:30:23Z", "8/13/2016", 
"8/13/2016")), .Names = "create_date", class = "data.frame",
 row.names = c(NA, -7L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • What if my date is in format `Sat Aug 13 15:27:16 GMT 2016` – Neil Aug 21 '16 at 10:20
  • 1
    @Neil Try `as.Date(str1, "%a %b %d %H:%M:%S GMT %Y") #[1] "2016-08-13"` where `str1 <- "Sat Aug 13 15:27:16 GMT 2016"` – akrun Aug 21 '16 at 13:40
  • Can I use above in `parse_date_time` ? Like `as.Date(parse_date_time(df$create_date, c('mdy', 'ymd_hms','%a %b %d %H:%M:%S GMT %Y')))` – Neil Aug 21 '16 at 14:54
  • 1
    @Neil You may need `as.Date(parse_date_time(c(df$create_date, str1), c('mdy', 'ymd_hms','a b d HMS Y')))#[1] "2016-08-13" "2016-08-13" "2016-08-13" "2016-08-13" "2016-08-13" "2016-08-13" "2016-08-13" "2016-08-13"` – akrun Aug 21 '16 at 14:58
  • 1
    Awesome.. Worked like charm. Thanks a lot for your answer and time. Appreciated. – Neil Aug 21 '16 at 15:03