I'm fairly new to R, but I've had a great deal of challenge doing what otherwise seems like a simple task.
I have a number of inconsistently recorded dates in one vector. My goal is to convert them all to the same R accepted date format in a new vector labeled clean_end_date
. Thus far, I have been able to use a combination of as.Date
and grep
to format almost all into the general %Y-%m-%d
format in R:
d$clean_end_date[grep("[0-9]{2}/[0-9]{2}/[0-9]{4}", d$End_Date, value=F)] <- as.character(as.Date(grep("[0-9]{2}/[0-9]{2}/[0-9]{4}", d$End_Date, value=T, fixed = FALSE), format="%d/%m/%Y"))
However, there are several dates that won't convert regardless of how I manipulate the code. The rows that produce NAs use dashes instead of slashes as separators. Different date formats have the same number of characters, so counting them won’t help distinguish between them. I can either identify these rows manually, or use a regular expressions string matching function to identify the format. I thought grep()
would help here, but it does not.
The code I have used to format most of it is here:
d$clean_end_date <- NA
d$clean_end_date[nchar(as.character(d$End_Date))<10] <- as.character(as.Date(d$End_Date[nchar(as.character(d$End_Date))<10], format="%m/%d/%y"))
d$clean_end_date[grep("[0-9]{4}/[0-9]{2}/[0-9]{2}", d$End_Date, value=F)] <- as.character(as.Date(grep("[0-9]{4}/[0-9]{2}/[0-9]{2}", d$End_Date, value=T, fixed = FALSE), format="%Y/%m/%d"))
d$clean_end_date[grep("[0-9]{4}-[0-9]{2}-[0-9]{2}", d$End_Date, value=F)] <- as.character(as.Date(grep("[0-9]{4}-[0-9]{2}-[0-9]{2}", d$End_Date, value=T, fixed = FALSE), format="%Y-%m-%d"))
d$clean_end_date[grep("[0-9]{2}/[0-9]{2}/[0-9]{4}", d$End_Date, value=F)] <- as.character(as.Date(grep("[0-9]{2}/[0-9]{2}/[0-9]{4}", d$End_Date, value=T, fixed = FALSE), format="%d/%m/%Y"))
d$clean_end_date[d$Community_id==42 & nchar(as.character(d$End_Date))==10] <- as.character(as.Date(d$End_Date[d$Community_id==42 & nchar(as.character(d$End_Date))==10], format="%m/%d/%Y"))
However, I am having a difficulty formatting dates in the format "11/31/2015" and "2014-02-29". I suspect this might be because R cannot distinguish between this format and formats of the same length like "2015/11/31" and "02-29-2014".
I would really appreciate help on this. I am fairly new to R, so would especially appreciate an answer that doesn't assume I speak R language fluently.