2

I have come across some unusual behavior calculating dates in R.

I am trying to estimate date of conception by back calculating 9 months from date of birth, where date of birth is in the format as_date("1946-03-31".

So, the calculation would look like

as_date("1946-03-31")- months(9)

However, instead of getting a date, I get NA.

Strangely, if I change the month number, I get a perfect result:

as_date("1946-03-31")- months(8)

or

as_date("1946-03-31")- months(10)

Both work.

I also get a perfect date if I use days

as_date("1946-03-31")- days(273)

Is there a totally obvious reason for this that I'm missing? If not, is this replicable across other setups?

Calen
  • 305
  • 4
  • 17
  • Try _periods_ instead: `as_date("1946-03-31") - weeks(39)` – Dirk Eddelbuettel Mar 10 '23 at 02:27
  • Yes, thank you. I realized I could simply use `days(274)`. I decided to post not for a solution, but to inquire about what appears to be internal inconsistency around `as_date("1946-03-31") - months(8)` and `as_date("1946-03-31") - months(10)` working but `as_date("1946-03-31") - months(9)` not working. This strikes me as unusual. – Calen Mar 10 '23 at 02:47
  • 1
    There are numerous FAQs at the sites of the various date/time libraries -- a difference is often made between 'intervals' and 'periods'. Some have operators so that 'May 30 plus three months' gets you 'August 30'. Keep reading around for background stuff, there simply are pitfalls in these topics "hence no quick and simple answers". – Dirk Eddelbuettel Mar 10 '23 at 02:52
  • Yeah, I've done a bit of reading on it, but getting dates for a difference of 10 and 8 months but not 9 really sent me for a loop. Guess it's just part of doing business when working with dates. – Calen Mar 10 '23 at 02:55

2 Answers2

2

Months are poorly defined. What is 1 month before March 30? It's not well-defined because February has fewer days than March.

What answer do you expect for as_date("1946-03-31") - months(9)? Is it the day after as_date("1946-03-30") - months(9), which would be 1945-07-01? Or is it the day before as_date("1946-04-01") - months(9), which would be 1945-06-30? Both would be somewhat reasonable, but they are different because June has 1 day less than March.

"9 months" as a human gestation period is also a really rough estimate. When I google "gestation period for humans" I get 280 days - I'd suggest using that instead. Days are nice and consistent. (Or look for a better estimate wikipedia has a nice page. Things will depend on whether you are trying to estimate the fertilization date or the date of last menstruation.)

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • All of that is fine - but why then would `as_date("1946-03-31") - months(8)` and `as_date("1946-03-31") - months(10)` work, where `as_date("1946-03-31") - months(9)` does not? I'm pointing out internal inconsistency in the function and getting downvoted. – Calen Mar 10 '23 at 02:44
  • 2
    @Calen - Agree the downvote is a bit harsh (so upvoted to even it out). You've asked a reasonable question. Just to expand on Gregor's answer and say explicitly why you get an `NA` result for `months(9)` but not for `months(8)` or `months(10)` is because the result `as_date("1946-03-31")- months(9)` *would be* 31st of June if that was a valid date but it isn't so `NA` is returned. However, 31st of May and July are valid dates which is why the other calculations work. – Ritchie Sacramento Mar 10 '23 at 03:08
  • 1
    Not sure who's downvoting you for this question. Regarding the "internal inconsistency", I'll reiterate my question back to you: *"What do you expect the response to be?"* Producing results when they are defined and producing `NA` when they are not defined is standard - and R is internally consistent here. `log(1)` is `0` and `log(-1)` is `NA`. This makes sense from the definition of "logarithm". `as_date('2023-01-28') + months(1)` is 2023-02-28 and `as_date('2023-01-31') + months(1)` is `NA`. This makes sense from the definition of "month". – Gregor Thomas Mar 10 '23 at 04:51
  • Adding or subtracting months "works" when the day number is within bounds of the resulting month. Otherwise the operation is ill-defined. Logarithms work when the input is >= 0, otherwise the operation is not defined. – Gregor Thomas Mar 10 '23 at 04:53
2

As nicely described by @GregorThomas and @RitchieSacramento, arithmetic with months is often ambiguous since they have inconsistent numbers of days. One approach is provided by lubridate’s %m+% and %m-% operators. As described in the docs:

Date %m+% months(n) always returns a date in the nth month after Date. If the new date would usually spill over into the n + 1th month, %m+% will return the last day of the nth month (rollback()). Date %m-% months(n) always returns a date in the nth month before Date.

With your example:

library(lubridate)

as_date("1946-03-31") %m-% months(9)
# "1945-06-30"
zephryl
  • 14,633
  • 3
  • 11
  • 30