19

I have two data frames in R. One frame has a persons year of birth:

YEAR
/1931
/1924

and then another column shows a more recent time.

RECENT
09/08/2005
11/08/2005

What I want to do is subtract the years so that I can calculate their age in number of years, however I am not sure how to approach this. Any help please?

Braiam
  • 1
  • 11
  • 47
  • 78
Brian
  • 4,023
  • 8
  • 29
  • 36
  • relevant: https://stackoverflow.com/questions/31126726/efficient-and-accurate-age-calculation-in-years-months-or-weeks-in-r-given-b?noredirect=1&lq=1 – moodymudskipper Nov 28 '17 at 10:32

8 Answers8

51

The following function takes a vectors of Date objects and calculates the ages, correctly accounting for leap years. Seems to be a simpler solution than any of the other answers.

age = function(from, to) {
  from_lt = as.POSIXlt(from)
  to_lt = as.POSIXlt(to)

  age = to_lt$year - from_lt$year

  ifelse(to_lt$mon < from_lt$mon |
         (to_lt$mon == from_lt$mon & to_lt$mday < from_lt$mday),
         age - 1, age)
}
Jim
  • 4,687
  • 29
  • 30
  • 6
    Clear, fast, and uses only base functions. Also handles leap years properly. Should be the top-voted answer. – nograpes Oct 01 '15 at 18:01
  • To [avoid `ifelse`](http://stackoverflow.com/questions/16275149/does-ifelse-really-calculate-both-of-its-vectors-every-time-is-it-slow): `out <- integer(length(year)); out[idx <- to_lt$mon < from_lt$mon] <- age - 1; out[!idx] <- age]` – MichaelChirico Feb 21 '16 at 21:04
  • @MichaelChirico Please check your syntax before submitting. There's one `]` too many and even then, it doesn't work right. Imagine you were born December 1980 and now is December 2018... You do the math. – MS Berends Dec 14 '18 at 14:55
9

You can solve this with the lubridate package.

> library(lubridate)

I don't think /1931 is a common date class. So I'll assume all the entries are character strings.

> RECENT <- data.frame(recent = c("09/08/2005", "11/08/2005"))
> YEAR <- data.frame(year = c("/1931", "/1924"))

First, let's notify R that the recent dates are dates. I'll assume the dates are in month/day/year order, so I use mdy(). If they're in day/month/year order just use dmy().

> RECENT$recent <- mdy(RECENT$recent)
      recent
1 2005-09-08
2 2005-11-08

Now, lets turn the years into numbers so we can do some math with them.

> YEAR$year <- as.numeric(substr(YEAR$year, 2, 5))

Now just do the math. year() extracts the year value of the RECENT dates.

> year(RECENT$recent) - YEAR
  year
1   74
2   81

p.s. if your year entries are actually full dates, you can get the difference in years with

> YEAR1 <- data.frame(year = mdy("01/08/1931","01/08/1924"))
> as.period(RECENT$recent - YEAR1$year, units = "year")
[1] 74 years and 8 months   81 years and 10 months
Garrett
  • 191
  • 1
9

I use a custom function, see code below, convenient to use in mutate and quite flexible (you'll need the lubridate package).

Examples

get_age("2000-01-01")
# [1] 17
get_age(lubridate::as_date("2000-01-01"))
# [1] 17
get_age("2000-01-01","2015-06-15")
# [1] 15
get_age("2000-01-01",dec = TRUE)
# [1] 17.92175
get_age(c("2000-01-01","2003-04-12"))
# [1] 17 14
get_age(c("2000-01-01","2003-04-12"),dec = TRUE)
# [1] 17.92176 14.64231

Function

#' Get age
#' 
#' Returns age, decimal or not, from single value or vector of strings
#' or dates, compared to a reference date defaulting to now. Note that
#' default is NOT the rounded value of decimal age.
#' @param from_date vector or single value of dates or characters
#' @param to_date date when age is to be computed
#' @param dec return decimal age or not
#' @examples
#' get_age("2000-01-01")
#' get_age(lubridate::as_date("2000-01-01"))
#' get_age("2000-01-01","2015-06-15")
#' get_age("2000-01-01",dec = TRUE)
#' get_age(c("2000-01-01","2003-04-12"))
#' get_age(c("2000-01-01","2003-04-12"),dec = TRUE)
get_age <- function(from_date,to_date = lubridate::now(),dec = FALSE){
  if(is.character(from_date)) from_date <- lubridate::as_date(from_date)
  if(is.character(to_date))   to_date   <- lubridate::as_date(to_date)
  if (dec) { age <- lubridate::interval(start = from_date, end = to_date)/(lubridate::days(365)+lubridate::hours(6))
  } else   { age <- lubridate::year(lubridate::as.period(lubridate::interval(start = from_date, end = to_date)))}
  age
}
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • That's a nice function. But, why did you decide to use `/dyears(1)` for `if(dec)`? Isn't age more something like a period and would require `/years(1)` ? – tjebo Mar 07 '18 at 17:44
  • 1
    It was a mistake, but your suggestion doesn't really work either, I changed it for `(lubridate::days(365)+lubridate::hours(6))` – moodymudskipper Mar 07 '18 at 18:01
  • interesting to combine `days()` and `hours()`. may I kindly as why you chose `hours(6)` ? Ah, oops, think I got it. Turning it into decimals. Would this account for leap years though? – tjebo Mar 07 '18 at 18:06
  • 1
    The idea is that the decimal age is the "real age", that considers a constant length for a year. So to account for leap years a year should be 365.25 days long, and 0.25 days is 6 hours :). I had a wrong understanding of what `dyears` was doing. try : `lubridate::interval(start = "2000-01-01", end = "2001-01-01")/(lubridate::days(365)+lubridate::hours(6))` then `lubridate::interval(start = "2000-03-01", end = "2001-03-01")/(lubridate::days(365)+lubridate::hours(6))` – moodymudskipper Mar 07 '18 at 18:10
2

You can do some formating:

as.numeric(format(as.Date("01/01/2010", format="%m/%d/%Y"), format="%Y")) - 1930

With your data:

> yr <- c(1931, 1924)
> recent <- c("09/08/2005", "11/08/2005")
> as.numeric(format(as.Date(recent, format="%m/%d/%Y"), format="%Y")) - yr
[1] 74 81

Since you have your data in a data.frame (I'll assume that it's called df), it will be more like this:

as.numeric(format(as.Date(df$recent, format="%m/%d/%Y"), format="%Y")) - df$year
Shane
  • 98,550
  • 35
  • 224
  • 217
  • Works for the data I've posted here, but my data set actually has many more rows. Is there a way I could accomplish this by calling on the data frames themselves? – Brian Aug 31 '10 at 17:12
  • In the sample way. Just replace recent and yr with your df columns. – Shane Aug 31 '10 at 17:24
2

Given the data in your example:

> m <- data.frame(YEAR=c("/1931", "/1924"),RECENT=c("09/08/2005","11/08/2005"))
> m
   YEAR     RECENT
1 /1931 09/08/2005
2 /1924 11/08/2005

Extract year with the strptime function:

> strptime(m[,2], format = "%m/%d/%Y")$year - strptime(m[,1], format = "/%Y")$year
[1] 74 81
eyjo
  • 1,180
  • 6
  • 8
  • 1
    Why? The beauty of object oriented programming is having methods that recognize date objects so you don't have to do this. – Vince Aug 31 '10 at 19:30
1

I think this might be a bit more intuitive and requires no formatting or stripping:

as.numeric(as.Date("2002-02-02") - as.Date("1924-08-03")) / 365

gives output:

77.55342

Then you can use floor(), round(), or ceiling() to round to a whole number.

Allen Wang
  • 2,426
  • 2
  • 24
  • 48
  • This does not account for leap years. – nograpes Oct 01 '15 at 17:49
  • You could do 365.25, which should be close enough. If you're looking for ages, isn't actual (number of days) age more important than calendar age? – Allen Wang Oct 02 '15 at 17:14
  • 2
    Sometimes, the actual number of days lived is perfectly fine (and perhaps better), but in other situations you really want the number of calendar years that have passed. Although two people who are 65 years old (according to the common definition) may have lived a different number of days, we often don't want to make that distinction. For example, if you were calculating if someone was eligible for retirement, nearly everyone uses whole years rather than days to make that calculation. – nograpes Oct 03 '15 at 00:56
1

Based on the previous answer, convert your columns to date objects and subtract. Some conversion of types between character and numeric is necessary:

> foo=data.frame(RECENT=c("09/08/2005","11/08/2005"),YEAR=c("/1931","/1924"))
> foo
      RECENT  YEAR
1 09/08/2005 /1931
2 11/08/2005 /1924
> foo$RECENTd = as.Date(foo$RECENT, format="%m/%d/%Y")
> foo$YEARn = as.numeric(substr(foo$YEAR,2,999))
> foo$AGE = as.numeric(format(foo$RECENTd,"%Y")) - foo$YEARn
> foo
      RECENT  YEAR    RECENTd YEARn AGE
1 09/08/2005 /1931 2005-09-08  1931  74
2 11/08/2005 /1924 2005-11-08  1924  81

Note I've assumed you have that slash in your year column.

Also, tip for when asking questions about dates is to include a day that is past the twelfth so we know if you are a month/day/year person or a day/month/year person.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
0

Really solid way that also supports vectors using the lubridate package:

age <- function(date.birth, date.ref = Sys.Date()) {
  if (length(date.birth) > 1 & length(date.ref) == 1) {
    date.ref <- rep(date.ref, length(date.birth))
  }

  date.birth.monthdays <- paste0(month(date.birth), day(date.birth)) %>% as.integer()
  date.ref.monthdays <- paste0(month(date.ref), day(date.ref)) %>% as.integer()

  age.calc <- 0

  for (i in 1:length(date.birth)) {
    if (date.birth.monthdays[i] <= date.ref.monthdays[i]) {
      # didn't had birthday
      age.calc[i] <- year(date.ref[i]) - year(date.birth[i])
    } else {
      age.calc[i] <- year(date.ref[i]) - year(date.birth[i]) - 1
    }
  }
  age.calc
}

This also accounts for leap years. I just check if someone has had a birthday already.

MS Berends
  • 4,489
  • 1
  • 40
  • 53