Calculate difference where numid is the same between datasets of differing length

Question

I have 2 data frames: "start.date" and "death.date". Each include 2 columns "numid" (a numeric id) and a "date" column. "start.date" is a dataset that records start of disease for each numid. "death.date" includes only those numid in "start.date" that died on the date in death.date$date.

I need to calculate the difference (=survival) between start.date and death.date for those with the same numid.

This is what I wrote:

 tempi<-as.numeric(factor(start.date$numid))
 tempj<-as.numeric(factor(death.date$numid))
 for(i in tempi){
   for(j in tempj){
     surviv[i]<-ifelse(colic.date$numid[i]==death.date$numid[j],
                         death.date$date.death[j]-colic.date$date.colic[i],
                         "alive")
   }  
 }

My issue here I think is that surviv[i] only keeps the last value of death.date$numid[j] but I can not find a way out. Anyone could shine some light on this please? There are probably easier ways to do this (it runs very slow - even with the wrong result)

Apologies if this is has been discussed somewhere I just could not find anything that works with my data.

Cheers Marco

score 0 · Accepted Answer · edited Jul 14 '14 at 09:18

Here's my stab at it, using a custom function to generate dates and then created two data.frames. I then found common.ids between the data.frames using intersect, and used difftime to find the difference in dates. Your code is slow as you're using for loops. Read the resources on this page for vectorizing your code.

I used intersect, though have a look at %in% as well to find common items.

#Function to get some dates, using a uniform distribution,
thanks to [Dirk Eddelbuettel][2]
unif.dates <-function(N, start = "2012/01/01", end = "2012/12/31") {
#Orginal at http://stackoverflow.com/a/14721124/2747709
start <- as.POSIXct(as.Date(start))#
end <- as.POSIXct(as.Date(end))#
dt <- as.numeric(difftime(end,start,unit = "sec"))#
ev <- sort(runif(N, 0, dt))#
rt <- start + ev
}
#Generating some random ids and dates and 
assigning them to data.frames

start.date <- data.frame(numid = sample(25,15), date = unif.dates(15, start = "2012/06/01", end = "2012/12/31"))

death.date <- data.frame(numid = sample(25,15),date = unif.dates(15, start = "2012/08/01", 
end = "2013/02/28"))
#Get Common ids between data.frames
common.ids <-intersect(death.date$numid,start.date$numid)
#Calculate time difference, this defauts to days, read ?difftime for other units
z <-difftime(death.date$date[death.date$numid %in% common.ids], start.date$date[start.date$numid %in% common.ids])

Thank you @Infominer for the suggestion. I had used %in% initially but could not make it work in this instance, but in the end your code worked for me. One if the issues was that even though I had specified the dates as.POSIXct in the original datasets then R reconverted these to numbers (not sure why) when I was using cbind to merge these date columns between the original datasets. So I got around it by first cbind-ing the columns with the date as.character() and then converted the date to as.POSIXct() just before using difftime(). Thanks for your help! — MarcoD, Jan 15 '14 at 12:40

Calculate difference where numid is the same between datasets of differing length

1 Answers1