11

The muhaz package estimates the hazard function from right censored data using kernel smoothing methods. My question is, is there any way to obtain confidence intervals for the hazard function that muhaz calculates?

options(scipen=999)
library(muhaz)
data(ovarian, package="survival")
attach(ovarian)
fit1 <- muhaz(futime, fustat)
plot(fit1, lwd=3, ylim=c(0,0.002))

muhaz hazard function

In the above example the muhaz.object fit has some entries fit1$msemin, fit1$var.min, fit1$haz.est however their length is half of the fit1$haz.est.

Any ideas if it is possible to extract confidence intervals for the hazard function?

EDIT: I tried the following based with what @user20650 suggested

options(scipen=999)
library(muhaz)
data(ovarian, package="survival")
fit1 <- muhaz(ovarian$futime, ovarian$fustat,min.time=0, max.time=744)


h.df<-data.frame(est=fit1$est.grid, h.orig=fit1$haz.est)

for (i in 1:10000){
d.s.onarian<-ovarian[sample(1:nrow(ovarian), nrow(ovarian), replace = T),]
d.s.muhaz<-muhaz(d.s.onarian$futime, d.s.onarian$fustat, min.time=0, max.time=744 )
h.df<-cbind(h.df, d.s.muhaz$haz.est)
}


h.df$upper.ci<-apply(h.df[,c(-1,-2)], 1,  FUN=function(x) quantile(x, probs = 0.975))
h.df$lower.ci<-apply(h.df[,c(-1,-2)], 1,  FUN=function(x) quantile(x, probs = 0.025))
plot(h.df$est, h.df$h.orig, type="l", ylim=c(0,0.003), lwd=3)
lines(h.df$est, h.df$upper.ci,  lty=3, lwd=3)
lines(h.df$est, h.df$lower.ci,  lty=3, lwd=3)

Setting max.time seems to works, every bootstrap sample hast the same estimating grid points. However the CI obtained, make little sense. Normally I would expect that the intervals are narrow at t=0 and get wider with time (less information, more uncertainty) but the intervals obtained seem to be more or less constant with time.

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
ECII
  • 10,297
  • 18
  • 80
  • 121
  • 2
    Could you bootstrap it?. This `with(fit1, plot(est.grid, haz.est, type="l", lwd=3, ylim=c(0,0.002)))` gives the same plot, so you will need estimates of `haz.est` at the same timepoints as for `fit1`. However, as you resample and refit the `muhaz` model the timepoints change, From a quick try, I think you can force `est.grid` to be at the same timepoints for each resample if you set the `min.time` and `max.time` to be the same as in the original fit. ie `with(dat, muhaz(futime, fustat, min.time=0, max.time=744))`, where `dat` is the bootstrap data. – user20650 Feb 22 '15 at 21:11
  • Setting max.time seems to works, every bootstrap sample hast the same estimating grid points. However the CI obtained, make little sense. Normally I would expect that the intervals get wider with time (less information, more uncertainty) but the intervals obtained seem to be more or less constant with time. – ECII Apr 23 '15 at 18:45
  • Related: https://stackoverflow.com/q/29728795/6574038 – jay.sf Dec 28 '18 at 12:12

1 Answers1

5

Bootstrapping provides the answer as the commenter suggested. Your intuitions are right that you should expect the CIs to widen as the number at risk decreases. However, this effect is going to be diminished by the smoothing process and the longer the interval over which smoothing is applied the less you should notice the change in size of the CI. Try smoothing over a sufficiently short interval and you should notice the CIs widen more noticeably.

As you may find, these smoothed hazard plots can be of very limited use and are highly sensitive to how the smoothing is done. As an exercise, it is instructive to simulate survival times from a series of Weibull distributions w/ the shape parameter set to 0.8, 1.0, 1.2, and then look at these smoothed hazard plots and try to categorize them. To the extent that these plots are informative, it should be fairly easy to tell the difference between those three curves based on the trend rate of the hazard function. YMMV, but I haven't been very impressed with the results when I've done this tests with reasonable sample sizes consistent with clinical trials in oncology.

As an alternative to smoothed hazard plots, you might try fitting piecewise exponential curves using the method of Han et al. (http://www.ncbi.nlm.nih.gov/pubmed/23900779) and bootstrapping that. Their algorithm will identify the break points at which there is a statistically significant change in the hazard rate and may give you a better sense of the trend in the hazard rate than smoothed hazard plots. It will also avoid the somewhat arbitrary but consequential choice of smoothing parameters.

jrdnmdhl
  • 1,935
  • 18
  • 26
  • Thank you for your elaborate answer. Do you know if there is an R package for that procedure? – ECII May 26 '15 at 06:15
  • I don't believe they have published their R code yet. You may be able to get the code from the authors if you explain how you will use it and ask nicely.Alternately, their paper should explain the method well enough to make it an interesting project. – jrdnmdhl May 26 '15 at 14:08
  • 1
    https://cran.r-project.org/web/packages/RPEXE.RPEXT/index.html – IRTFM Apr 02 '21 at 22:00
  • I should have included a label for that link. It is an implementation by Han of the methods used in Han, Schell, and Kim (2012 , 2014 ), and Han et al. (2016 ). The 2014 paper cited above is free from Stat in Med – IRTFM Jan 09 '22 at 01:52