0

I am trying to create two new column of people's ages and their age groups(5 year intervals) given their date of birth from a data frame. The current data frame For example:

Person      Date of Birth 
A             1/2/2000
B             3/2/1998
C             4/5/2008

The expected outcome is :

Person      Date of Birth     Age   Age-Group
A             1/2/2000        18    15-20
B             3/2/1990        28    25-30
C             4/5/2008        10    5-10

What is the best way to do this on the most efficient way for a large data set? Thanks

  • Can you share the non-working code you tried?. Also have a look at [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Sotos Jun 28 '18 at 09:37

1 Answers1

0

Something like this? BTW, I slightly adjusted the age groups you used in your example since using 5-10 and 15-20 would mean you will use an age group of 11-14 as well, which would seem weird to me.

df <- read.table(text = "
Person      DateofBirth 
A             1/2/2000
B             3/2/1998
C             4/5/2008", header = T)

library(lubridate)

df$age <- interval(as.Date(df$DateofBirth, "%d/%m/%Y"), Sys.Date()) %/% years(1)
df$agegroup <- cut(df$age, seq(5,30,5), c("5-10", "11-15", "16-20", "21-25", "25-30"))
df

  Person DateofBirth age agegroup
1      A    1/2/2000  18    16-20
2      B    3/2/1998  20    16-20
3      C    4/5/2008  10     5-10

If you many more agegroups, you could as well consider to generalize the last cut argument like this:

df1 <- data.frame(age = 1:100)
df1$agegroup <- cut(df1$age, seq(0,100,5), paste0(seq(1,96, 5), "-", seq(5,100,5)))
Lennyy
  • 5,932
  • 2
  • 10
  • 23