0

I have looked around for ages trying to find what I am looking for but none of the code has given me what I want. I need to create a variable that calculates the difference in months between two date variables.

For example, if I have the data below:

start_date   end_date
2010-01-01  2010-12-31
2016-05-01  2016-12-31
2004-03-01  2004-10-31
1997-10-01  1998-08-31

I would like the outcome to look like the following:

start_date   end_date    month_count
2010-01-01  2010-12-31       12
2016-05-01  2016-12-31       8
2004-03-01  2004-10-31       8
1997-10-01  1998-08-31       11

Meaning I would like the whole last month to be included. Many of the codes I have checked have given me 11 months for the first observation instead of 12 for example. Also, many codes have said to specify the actual date but as I have a large dataset I can't do that, and would need to go by the variables instead.

Thank you in advance!

  • You get 11 month in return, because from 2010-01-01 00:00:00 to 2010-12-31 00:00:00 is not 12 months, but 11 + 30/31 month. Just add one day to the end_date, en perform your operation again. – Wimpel Apr 02 '21 at 08:13
  • if all the dates have the 1 to 30 or 31 day gap you could do it by just adding 1 to the month. even something as simple as `month_count<-(lubridate::month(end_date)-lubridate::month(start_date))+1` would work – D.J Apr 02 '21 at 08:17
  • How can a project end before it started (row 2)? Given your comments below shouldn't the output for row 3 also count for years in between? Please clarify! – AnilGoyal Apr 02 '21 at 09:16
  • so sorry @AnilGoyal I had been mistaken when writing the question, I have now done the necessary edits/changes above for it to be correct – coding_sailor Apr 02 '21 at 09:38
  • how do I use the year to account for this as you mentioned below? – coding_sailor Apr 02 '21 at 09:39
  • Use `floor_date` and `ceiling_date` in that case from lubridate – AnilGoyal Apr 02 '21 at 09:44
  • could you give me an example? I am quite new to R and haven't really gotten the hang of it yet, thanks :) – coding_sailor Apr 02 '21 at 09:45
  • See [this question](https://stackoverflow.com/questions/1995933/number-of-months-between-two-dates) for reference – AnilGoyal Apr 02 '21 at 09:46

2 Answers2

1

dplyr way

library(lubridate)
library(dplyr)

df %>% mutate(across(everything(), ~as.Date(.))) %>%
  mutate(months = (year(end_date) - year(start_date) * 12) + month(end_date) - month(start_date) + 1)
 
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
0

Here is a possible way:

library(data.table)

dtt <- fread(text = 'start_date   end_date
2010-01-01  2010-12-31
2016-05-01  2010-12-31
2004-03-01  2010-10-31')

dtt[, month_count := month(end_date) - month(start_date) + 1]
dtt
#    start_date   end_date month_count
# 1: 2010-01-01 2010-12-31          12
# 2: 2016-05-01 2010-12-31           8
# 3: 2004-03-01 2010-10-31           8
mt1022
  • 16,834
  • 5
  • 48
  • 71