I'm looking at the effect the weekend may have on an individual's survival and so I'm trying to convert my data into a time-dependent structure with one row per interval. It will probably be a Cox PH model with DschDT (Discharge Date) as the censoring date. Either a patient is discharged alive (right-censored) or dies within hospital.
The data looks like this, where DIH is my censoring variable (0,1)
`structure(list(Age = c(28L, 77L, 92L, 28L, 59L, 7L), Sex = structure(c(1L,
2L, 1L, 1L, 2L, 2L), .Label = c("F", "M"), class = "factor"),
Care.type = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Acute",
"Organ.Procurement", "Geriatric.Eval.Mgt", "Psychogeriatric",
"Maintenance", "Rehab", "Palliative"), class = "factor"),
AdmDT = structure(c(1396282680, 1396311600, 1396329780, 1396331040,
1396343940, 1396348080), class = c("POSIXct", "POSIXt"), tzone = ""),
DschgDT = structure(c(1396288800, 1396335600, 1397721600,
1396338600, 1396390200, 1396359120), class = c("POSIXct",
"POSIXt"), tzone = ""), DIH = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("Age",
"Sex", "Care.type", "AdmDT", "DschgDT", "DIH"), row.names = c(1L,
7L, 8L, 9L, 10L, 11L), class = "data.frame")`
For example, I might have a patient who is admitted to hospital on a Wednesday and is discharged alive the following Thursday. In this case there will be three rows for this patients episode. One for the Wednesday-Friday, one for Saturday-Sunday and one for Monday-Thursday, all inclusive.
I've managed to identify the weekends within a time period using this function.
getDuration <- function(d1, d2,fmt="%Y-%m-%d %H%M") {
myDays <- seq.Date(to = as.Date(d2, format=fmt),
from = as.Date(d1, format =fmt),
by = 1)
myDays[which(is.weekend(myDays))]
}
dat<-mapply(getDuration,AdmDT,DschgDT)
> head(clip)
ID StartDate EndDate Start Time Event WeekendStart1 WeekendEnd1 WeekendStart2 WeekendEnd2
1 1 9/08/2013 16/08/2013 0 7 0 1 3 0 0
2 2 9/12/2013 12/12/2013 0 3 0 0 0 0 0
3 3 9/01/2014 17/01/2014 0 8 1 2 4 0 0
After identifying where the weekends occur between dates I would like to split the time acording to weekends . For this example the resulting data would look like:
clip2
ID StartDate EndDate Start Time Event Weekend
1 1 9/08/2013 16/08/2013 0 1 0 0
2 1 9/08/2013 16/08/2013 1 3 0 1
3 1 9/08/2013 16/08/2013 3 7 0 0
4 2 9/12/2013 12/12/2013 0 3 0 0
5 3 9/01/2014 17/01/2014 0 2 0 0
6 3 9/01/2014 17/01/2014 2 4 0 1
7 3 9/01/2014 17/01/2014 4 8 1 0
However I can't seem to find a way to split the time intervals in an efficent way, survSplit
and tmerge
from the survival
package don't seem to have functionality to do this.
Can anyone give me some ideas, other than running a big ugly loop?
Update. Well I managed to do it after lots of head scratching. For those who are interested. This function finds what is considered a weekend by a hospital, ie. beginning on Friday night and ending early Monday morning. Of course you can edit to suit. This function returns Friday and Sunday, so that you can split on these days.
is.weekend<-function (x)
{
library(chron)
if (!inherits(x, "dates"))
x<-as.chron(as.character(x))
v <- month.day.year(x)
h<-hours(x)
out <- day.of.week(v$month, v$day, v$year) + 1
# 1 is Sunday and 7 is Saturday, h is hours
x<-((out == 6 & h >= 18) | out==7|out==1|(out == 2 & h < 6))
return(x)
}
This is a simpler version of above to get the intervals
Basic function to identify the number of weekends starting on Saturday and ending on Sunday. d1 and d2 are admission and discharge date/times respectively.
getDuration <- function(d1, d2) {
myDays <- seq(d1,d2,by="hour")
myDays[which(is.weekend(myDays))]
}
This function makes the time sequence for each record
survSeq.dh<-function(a,w){
aa<-sort(c(a,as.POSIXct(w)))
aa<-diff(aa)
units(aa)<-"hours"
aa<-as.numeric(aa)
aa<-cumsum(aa)
#Identify the start and end of weekends
aa1<-which(diff(aa)!=1)
aa1<-sort(c(aa1,aa1+1))
aa1<-c(aa[1],aa[aa1],aa[length(aa)])/24
}
A bit of housekeeping
#Make a survSplit object
#Create a start and stop time
dat$start<-0
dat$time<-as.numeric(dat$separation_datetime-dat$admission_datetime)/(60*24)
Event variable
dat$DIH<-dat$mode_of_separation=="Died in hospital"
The latest version of survival::survSplit creates a Surv object and this slows down the process considerably, so I use the old version.
New survSplit function in survival package 2.39-2 is too slow.
survSplit2<-function (data, cut, end, event, start, id = NULL, zero = 0,
episode = NULL)
{
cut <- sort(cut)
ntimes <- length(cut)
n <- nrow(data)
newdata <- lapply(data, rep, ntimes + 1)
endtime <- rep(c(cut, Inf), each = n)
eventtime <- newdata[[end]]
if (start %in% names(data))
starttime <- data[[start]]
else starttime <- rep(zero, length.out = n)
starttime <- c(starttime, pmax(starttime, rep(cut, each = n)))
epi <- rep(0:ntimes, each = n)
status <- ifelse(eventtime <= endtime & eventtime > starttime,
newdata[[event]], 0)
endtime <- pmin(endtime, eventtime)
drop <- starttime >= endtime
newdata <- do.call("data.frame", newdata)
newdata[, start] <- starttime
newdata[, end] <- endtime
newdata[, event] <- status
if (!is.null(id))
newdata[, id] <- rep(rownames(data), ntimes + 1)
if (!is.null(episode))
newdata[, episode] <- epi
newdata <- newdata[!drop, ]
newdata
}
Then run in a script
Find weekend/after hours durations for each patient record
xx.s<-mapply(getDuration,dat$admission_datetime,dat$separation_datetime))
Define start and stop times for each weekend stay
xx.surv<-mapply(survSeq,dat$admission_datetime,xx.s)
Put the lot into a loop (sorry)
lengthx<-dim(dat)[1]
dat.l<-list()
for(i in 1:lengthx){
print(i)
dat.l[[i]]<-survSplit2(dat[i,],cut=xx.surv[[i]],end="time",start="start",event="DIH")
}
library(data.table)
dat.l<-data.frame(rbindlist(dat.l))
So now I have the basis of a way develop a time-dependent model that allows the person's hospital stay to switch between hazard functions as their stay alternates between the weekend and week day.
e.g. coxph(Surv(start,time,DIH)~DayOfWeek)