0

I have a column of birthdates from my participants and I need to get their ages measured in months. I'm wondering if there's an automatic way to do that accounting for facts such as that months have different number of days and years vary in number of days as well (among others) ? What I mean is: I know I can set a random 30 or 31 to months or 365 or 366 to years, but I'm wondering if is there a way to make R get the ACTUAL dates, as such Sys.Date() does, so that I could be more precise about that. This is what I haven't seen in the other questions [EDITED]

  • data:
head(data)
  ID      BIRTH YEAR
1  A 23/04/2009 2009
2  B 24/03/2010 2010
3  C 28/12/2009 2009
  • I need to obtain participant's ages in months from a specific date. Let's say from their birth until 31/08/2020, for example (note: dates are in the Brazilian notation DAY/MONTH/YEAR). Any ideas?

I've seen many interesting posts, such as this one, but they didn't quite solve what I need, hence I hope this is not a duplicate. I've also seen some suggestions such as difftime("23/04/2009", "31/08/2020", units = 'weeks') , but it doesn't work for months.

  • data:
> dput(data)
structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U"
), BIRTH = c("23/04/2009", "24/03/2010", "28/12/2009", "19/08/2009", 
"02/12/2009", "12/05/2010", "26/02/2010", "07/10/2009", "22/04/2010", 
"01/04/2010", "31/03/2010", "27/01/2010", "23/09/2009", "28/09/2009", 
"28/10/2009", "21/06/2009", "28/10/2009", "19/08/2009", "10/09/2009", 
"13/07/2009", "22/09/2009"), YEAR = c("2009", "2010", "2009", 
"2009", "2009", "2010", "2010", "2009", "2010", "2010", "2010", 
"2010", "2009", "2009", "2009", "2009", "2009", "2009", "2009", 
"2009", "2009")), row.names = c(NA, -21L), class = "data.frame")
Larissa Cury
  • 806
  • 2
  • 11
  • Maybe this answer worth for you https://stackoverflow.com/questions/1995933/number-of-months-between-two-dates – juanbarq Jan 29 '23 at 15:17
  • Oh, I've seen that post before too, forgot to add to my post. Yeah, that was the closer I got, but still I couldn't get my head around to use that to solve my problem @juanbarq – Larissa Cury Jan 29 '23 at 15:23

3 Answers3

1

First be sure both of your dates are in date format (one option is to use lubridates dmy function). Having date formats we could do maths, meaning to substract one date from the other. The trick is to wrap the whole thing around as.numeric to get numeric class. By the way to get month from days we have to divide by 365/12:

library(lubridate)
library(dplyr)
data %>% 
  mutate(Age_in_Months = as.numeric(dmy("31/08/2020") - dmy(BIRTH)) / 365/12)
  ID      BIRTH YEAR Age_in_Months
1   A 23/04/2009 2009     0.9470320
2   B 24/03/2010 2010     0.8705479
3   C 28/12/2009 2009     0.8901826
4   D 19/08/2009 2009     0.9200913
5   E 02/12/2009 2009     0.8961187
6   F 12/05/2010 2010     0.8593607
7   G 26/02/2010 2010     0.8764840
8   H 07/10/2009 2009     0.9089041
9   I 22/04/2010 2010     0.8639269
10  J 01/04/2010 2010     0.8687215
11  K 31/03/2010 2010     0.8689498
12  L 27/01/2010 2010     0.8833333
13  M 23/09/2009 2009     0.9121005
14  N 28/09/2009 2009     0.9109589
15  O 28/10/2009 2009     0.9041096
16  P 21/06/2009 2009     0.9335616
17  Q 28/10/2009 2009     0.9041096
18  R 19/08/2009 2009     0.9200913
19  S 10/09/2009 2009     0.9150685
20  T 13/07/2009 2009     0.9285388
21  U 22/09/2009 2009     0.9123288
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • @TarJee, hi, thanks, I almost got that. I'm struggling to understand this part: ```asnumeric(dmy("31/08/2020"))``` which equals 18505. What does this number mean? – Larissa Cury Jan 30 '23 at 12:27
  • 1
    Dates could be explained in different ways. One is: integer in days from "1970-01-01" is used. See this example: `library(lubridate) as_date(1)` will give `[1] "1970-01-02"` One day plus 1970-01-01. `as_date(10)` will give: `[1] "1970-01-11"` e.g. 10 days plus 1970-01-01. And finally `as_date(18505)` will give: 1] "2020-08-31". Both are the same with the difference that in numeric form you could do calculations. Therefore I use lubridate because for me it is easier to think of dates! – TarJae Jan 30 '23 at 13:06
  • 1
    There is something wrong here as all 'age in month' values are between 0 and 1. – Dirk Eddelbuettel Jan 31 '23 at 19:44
  • You are right. Hmmh will check when I am on my desktop. thx. do you have any idea? – TarJae Jan 31 '23 at 20:26
1

Using the functions from answer here you can proceed to calculate your ages difference in months like you can see in Dif_months1. For more accurate you can use interval from lubridate, see Dif_months2:

library(lubridate)
data <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", 
                      "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U"
), BIRTH = as.Date(c("23/04/2009", "24/03/2010", "28/12/2009", "19/08/2009", 
             "02/12/2009", "12/05/2010", "26/02/2010", "07/10/2009", "22/04/2010", 
             "01/04/2010", "31/03/2010", "27/01/2010", "23/09/2009", "28/09/2009", 
             "28/10/2009", "21/06/2009", "28/10/2009", "19/08/2009", "10/09/2009", 
             "13/07/2009", "22/09/2009"), format="%d/%m/%Y"), YEAR = c("2009", "2010", "2009", 
                                                   "2009", "2009", "2010", "2010", "2009", "2010", "2010", "2010", 
                                                   "2010", "2009", "2009", "2009", "2009", "2009", "2009", "2009", 
                                                   "2009", "2009")), row.names = c(NA, -21L), class = "data.frame")

# turn a date into a 'monthnumber' relative to an origin
monnb <- function(d) { lt <- as.POSIXlt(as.Date(d, origin="1900-01-01")); lt$year*12 + lt$mon } 
# compute a month difference as a difference between two monnb's
mondf <- function(d1, d2) { monnb(d2) - monnb(d1) }

data %>% mutate(Dif_months1 = mondf(BIRTH, Sys.Date()), 
                Dif_months2 = interval(BIRTH, Sys.Date()) %/% days(1) / (365/12))

Note that I format your original dates with format="%d/%m/%Y"

Output:

   ID      BIRTH YEAR Dif_months Dif_months2
1   A 2009-04-23 2009        165    165.3370
2   B 2010-03-24 2010        154    154.3233
3   C 2009-12-28 2009        157    157.1507
4   D 2009-08-19 2009        161    161.4575
5   E 2009-12-02 2009        157    158.0055
6   F 2010-05-12 2010        152    152.7123
7   G 2010-02-26 2010        155    155.1781
8   H 2009-10-07 2009        159    159.8466
9   I 2010-04-22 2010        153    153.3699
10  J 2010-04-01 2010        153    154.0603
11  K 2010-03-31 2010        154    154.0932
12  L 2010-01-27 2010        156    156.1644
13  M 2009-09-23 2009        160    160.3068
14  N 2009-09-28 2009        160    160.1425
15  O 2009-10-28 2009        159    159.1562
16  P 2009-06-21 2009        163    163.3973
17  Q 2009-10-28 2009        159    159.1562
18  R 2009-08-19 2009        161    161.4575
19  S 2009-09-10 2009        160    160.7342
20  T 2009-07-13 2009        162    162.6740
21  U 2009-09-22 2009        160    160.3397
juanbarq
  • 374
  • 6
  • 1
    hi, thanks :) I saw that answer in the original post before, but I didn't quite got that. I didn't get what 'monnb' is actually doing since we have to set an arbitrary origin to it. It calculates anything from '1900-01-01' ? or it's just there to have the correct date format? and when we do ```days(1) / (365/12)``` we're kinda 'arbitraly' defining 30 as a month, right? – Larissa Cury Jan 30 '23 at 12:47
  • 1
    Hi @larissa-cury! The origin paremeter it´s optional in `as.Date`, you can use `as.Date(d)` and work the same. Well, with `(365/12)` you are estimating each month 30,41 days – juanbarq Jan 30 '23 at 14:02
1

As you noted, difftime() does not work for "month" as it is not possible to give a "definite" answer: one could divide by 30, or 31, or use days in the year accounding for leap years or not and so on.

Otherwise, you problem is a two-liner. Assume your data is in data.frame D:

D <- within(D, bd <- as.Date(BIRTH, "%d/%m/%Y")) 
D <- within(D, dm <- as.numeric(difftime(as.Date("2020-08-31"), bd))/30) 

where we first parse as a Date and then use difftime and scaling an (arbitrarily chose) 30 days per months. This gets us an expanded data.frame:

> D
   ID      BIRTH YEAR         bd      dm
1   A 23/04/2009 2009 2009-04-23 138.267
2   B 24/03/2010 2010 2010-03-24 127.100
3   C 28/12/2009 2009 2009-12-28 129.967
4   D 19/08/2009 2009 2009-08-19 134.333
5   E 02/12/2009 2009 2009-12-02 130.833
6   F 12/05/2010 2010 2010-05-12 125.467
7   G 26/02/2010 2010 2010-02-26 127.967
8   H 07/10/2009 2009 2009-10-07 132.700
9   I 22/04/2010 2010 2010-04-22 126.133
10  J 01/04/2010 2010 2010-04-01 126.833
11  K 31/03/2010 2010 2010-03-31 126.867
12  L 27/01/2010 2010 2010-01-27 128.967
13  M 23/09/2009 2009 2009-09-23 133.167
14  N 28/09/2009 2009 2009-09-28 133.000
15  O 28/10/2009 2009 2009-10-28 132.000
16  P 21/06/2009 2009 2009-06-21 136.300
17  Q 28/10/2009 2009 2009-10-28 132.000
18  R 19/08/2009 2009 2009-08-19 134.333
19  S 10/09/2009 2009 2009-09-10 133.600
20  T 13/07/2009 2009 2009-07-13 135.567
21  U 22/09/2009 2009 2009-09-22 133.200
> 

You could also replace BIRTH, of course, or just use vectors. I like keeping the data together. You could make the reference data a function argument for a helper function if you desire.

No other packages needed for the two-liner. And, if you wanted to, a one-liner also works where the reference date is still the one parameter used besides the input data:

D <- within(D, dm <- as.numeric(difftime(as.Date("2020-08-31"),
        as.Date(BIRTH, "%d/%m/%Y"))/30))

(which I broke over two lines for the display but is really just one).

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Hi, @Dirk Eddelbuettel , thanks, So, this was my first choice, but I'm trying to avoid having to choose arbitraly between 30, 31 or 29 days per month and I couldn't run the second line of the code: ```Error in UseMethod("within") : no applicable method for 'within' applied to an object of class "function"``` – Larissa Cury Jan 30 '23 at 12:20
  • I copied and pasted from a worked example. You cam also reduce it to _one_ line if you first assign your data to `D`. The it is `D <- within(D, dm <- as.numeric(difftime(as.Date("2020-08-31"), as.Date(BIRTH, "%d/%m/%Y"))/30))` which is the same as before but skips an _explicit_ conversion of `BIRTH`. Ensure you have a data.frame with the correct name, and ensure you use a `<-` inside `within`. – Dirk Eddelbuettel Jan 30 '23 at 12:37