1

I am new to R and am finding it difficult to generate a series of rows where each generated row has a calculated date.

For example, going from a dataset like this:

Name  date_birth
Greg  01/02/2015
Fred  02/02/2015

...to generate the following:

Name date_birth age date_atage<br/>
Greg 01/02/2015   0     01/02/2015
Greg 01/02/2015   1     02/02/2015
Greg 01/02/2015   2     03/02/2015
Fred 02/02/2015   0     02/02/2015
Fred 02/02/2015   1     03/02/2015
Fred 02/02/2015   2     04/02/2015

I have been studying sites like R-blogger, general instructional blogs and this site and I have been trying to figure out a loop statement involving the Seq statement, so that for each individual (e.g. Greg, Fred, etc) the process can be repeated where dates are calculated and placed in their own rows. Your first thought may be that this is simpler to do in Excel, but it isn't, as I need to repeat this for over 800 individuals (i.e. not just Greg and Fred), and for up to 300 days of age.

989
  • 12,579
  • 5
  • 31
  • 53
ElTenero
  • 23
  • 5
  • Where are you getting age from, or does it just increment by one? – MikeRSpencer Jul 12 '16 at 08:58
  • Yes, it simply increments by 1 day. But I would like the generated list to go from 1 day of age of to 300 days of age. So basically a generate 300 new rows for each individual. – ElTenero Jul 12 '16 at 09:02
  • Also see this solution: https://stackoverflow.com/questions/14450384/create-a-vector-of-all-days-between-two-dates – DirtStats Feb 12 '19 at 22:15

3 Answers3

3

We can use data.table

 library(data.table)
 setDT(df1)[, .(date_birth, date_at_age = format(seq(as.Date(date_birth, 
      "%d/%m/%Y"), length.out=3, by = "1 day"), "%d/%m/%Y")) ,
           by = Name][,age := seq_len(.N)-1 , by = Name][]
#   Name date_birth date_at_age age
#1: Greg 01/02/2015  01/02/2015   0
#2: Greg 01/02/2015  02/02/2015   1
#3: Greg 01/02/2015  03/02/2015   2
#4: Fred 02/02/2015  02/02/2015   0
#5: Fred 02/02/2015  03/02/2015   1
#6: Fred 02/02/2015  04/02/2015   2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I had success with the above. I also attempted to calculate age of pregnancy (and associated dates). Assuming pregnancy starts 290 days prior to birth, pregnancy age (days) = 290 + (date in pregnancy - birth date). I used the below code to calculate pregnancy age, but resulting values are negative. Is there a way to make them positive? library(data.table) setDT(df)[, .(date_birth, date_at_pregage = format(seq(as.Date(date_birth, "%d/%m/%Y"), length.out=291, by = "-1 day"), "%d/%m/%Y")) , by = name][,preg_age := seq_len(.N)-291 , by = name][] – ElTenero Jul 13 '16 at 01:32
0

This is a long form way of getting the same place that data.table will take you.

Have a look at how you use dates in R. I've taken your original format and converted it to a date (code line 2). See http://strftime.org/ for more codes.

Set some dummy data:

df = data.frame(name=c("Gregg", "Joan"), DOB=c("01/02/2015", "02/02/2015"), stringsAsFactors=F)

Make date format:

df$DOB = as.Date(df$DOB, format="%d/%m/%Y")

Loop over each name, making 301 instances and adding day to DoB

df = lapply(1:nrow(df), function(i){
   x = data.frame(name=rep(df[i, 1], times=301),
                  DoB=rep(df[i, 2], times=301),
                  age=0:300)
   x$newDate = x$DoB + x$age
   x
})

Convert list to a data frame:

df = do.call("rbind.data.frame", df)

Check output:

head(df)
MikeRSpencer
  • 1,276
  • 10
  • 24
0

Setup

df <- cbind(c("Greg","Fred"),c("01/02/2015","02/02/2015"))
max_age <- 2
start_at <- 0

Script

new_df <- data.frame(rep(NA,(max_age+1)*dim(df)[1])) 
new_df[,1] <- rep(df[,1],each=max_age-start_at+1) #Names
new_df[,2] <- rep(df[,2],each=max_age-start_at+1) #Birth date
new_df[,3] <- rep(seq(from=start_at,to=max_age),dim(df)[1]) #Age
library(lubridate)
new_df[,4] <- dmy(new_df[,2]) + days(new_df[,3]) #Date at age
colnames(new_df) <- c("names","date_birth","age","date_at_age")
Simon
  • 621
  • 4
  • 21