I have a column of dates, I want to extract the year, month and day into separate columns. Unfortunately there are inconsistent entries in the dates column, so the normal solution of using format(as.Date(),"%Y")
or lubridate::year()
doesn't work.
Here is an example dataframe:
dates_df <- data.frame(dates = c("1985-03-23", "", "1983", "1984-01"))
And here is the desired result:
dates year month day
1 1985-03-23 1985 3 23
2 <NA> <NA> <NA>
3 1983 1983 <NA> <NA>
4 1984-01 1984 1 <NA>
I can achieve the desired result with the following code, but it is very slow on large datasets (>100,000 rows):
dates_df$year <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[1])
dates_df$month <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[2])
dates_df$day <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[3])
My question:
Is there a more efficient (fast) way to extract year, month, day columns from messy date data?