I would like a function that takes three columns of a data frame with the year, month, and day and constructs a valid date object. I would like the behavior an syntax to be like the Excel DATE function. Building on the answer to another question, I have the following function:
ExcelDate <- function(yearCol, monthCol, dayCol) {
# Syntax: DATE(year,month,day)
# Requires lubridate and magrittr which are both in tidyverse
ReturnCol = paste(yearCol, monthCol, dayCol, sep="-") %>% ymd() %>% as.Date()
return(ReturnCol)
}
Note that Excel dates have an integer numbering system relative to 1900 that I'm not looking to duplicate, I just want the year / month / day input format and the output date column formated as a date object.
However, the Excel function handles invalid inputs in a way that is very useful for date arithmetic. See the following picture for examples. Basically, the 0th day of the month is the last day of the prior month and the 0th month is the last month of the prior year. Any input t-days beyond the normal month length are converted into the t-th day of the following month (or year if needed.
There are lots of fiddly edge cases here (different month lengths, leap years, inputs that shift over multiple months or years) that are making me reluctant to write such a thing from scratch. What's a vectorized way of accomplishing this that uses existing date libraries to make this easy and relatively fast?
Year (input) | Month (input) | Day (input) | Date (output) | Notes |
---|---|---|---|---|
2020 | 2 | 17 | 2/17/2020 | Code handles this case |
2020 | 2 | 30 | 3/1/2020 | Too many days |
2020 | 2 | -1 | 1/30/2020 | Too few days |
2020 | 2 | 0 | 1/31/2020 | Too few days |
2020 | 13 | 17 | 1/17/2021 | Too many months |
2020 | 0 | 17 | 12/17/2019 | Too few months |
2020 | 2 | 67 | 4/7/2020 | Way too many days |
2020 | 2 | 670 | 12/1/2021 | Way way too many days |
2020 | 2 | 29 | 2/29/2020 | Leap year |
2021 | 2 | 29 | 3/1/2021 | Not a leap year |
2100 | 2 | 29 | 3/1/2100 | Not a leap year (surprise!) |
Here is sample code to load the data and the desired answer into R:
df = data.frame(
year = c(2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2021, 2100),
month = c(2, 2, 2, 2, 13, 0, 2, 2, 2, 2, 2),
day = c(17, 30, -1, 0, 17, 17, 67, 670, 29, 29, 29),
date = mdy(c("2/17/2020", "3/1/2020", "1/30/2020", "1/31/2020", "1/17/2021", "12/17/2019", "4/7/2020", "12/1/2021", "2/29/2020", "3/1/2021", "3/1/2100"))
)