1

I am running into an issue while performing CoxPH analysis using the following sample dataset:

structure(list(Systemic.Tx...2.classification..Chemotherapy..PD1.monotherapy..PD.1.CTLA.4.combo..PD.1.chemo..targetted.Tx..targetted.chemo.combo..etc.
 = c("Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted/chemo combo", "Targetted Tx", "Targetted Tx",  "Targetted Tx"), Time.on.systemic.Tx =
 c("2.069815195", "2.332648871",  "2.069815195", "1.215605749",
 "2.661190965", "0.689938398", "1.839835729",  "2.858316222",
 "0.657084189", "2.529774127", "1.80698152", "3.482546201", 
 "2.891170431", "3.515400411", "2.431211499", "3.515400411",
 "1.347022587",  "5.519507187", "17.47843943", "26.90759754",
 "6.176591376", "5.979466119",  "8.246406571", "15.40862423",
 "5.749486653", "6.242299795", "5.683778234",  "6.636550308",
 "10.15195072", "10.0862423", "18.52977413", "5.749486653", 
 "10.7761807", "6.965092402"), PFS2 = c(2.595482546, 2.37, 2.069815195, 
1.412731006, 1.938398357, 0.657084189, 2.529774127, 3.219712526, 
 0.657084189, 2.529774127, 2.2, 3.482546201, 2.529774127, 3.712525667, 
 2.234086242, 3.778234086, 1.347022587, 5.55, 17.3798768, 30.32443532, 
 7.12936345, 7.09650924, 8.246406571, 15.24435318, 5.519507187, 
 5.749486653, 5.420944559, 6.636550308, 9.264887064, 10.02053388, 
 18.20123203, 6.110882957, 10.61190965, 6.866529774), PFS2_event = c(1,  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1,  1, 1,
 1, 1, 1, 0, 1, 1, 0, 1, 1, 1), Binarised_Time.on.Tx.2 = c("≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months")), row.names = c(NA, -34L), class =
 "data.frame")

And here is the code I am using for this analysis:

fit1 <- coxph(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
 Test_Dataset) 
summary(fit1)

I receive the following warning after running this code:

Warning message: In coxph.fit(X, Y, istrat, offset, init, control, weights = weights, : Loglik converged before variable 1 ; coefficient may be infinite.

And more importantly I am receiving incorrect results, since the confidence interval goes from 0 to Inf and the co-efficient and p-values are really high. I have run this analysis for Overall Survival using the same dataset which has worked well without any issues. Any suggestions as to what might be driving this issue with respect to my PFS2 values?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Adam
  • 59
  • 8

1 Answers1

1

This is a variant of the complete separation problem, which you can start to read about (e.g.) here.

These aren't really incorrect estimates, they're the attempt to show infinite estimates. The Wald estimates of the standard errors fail in this case (this is called the Hauck-Donner effect).

Some possible solutions:

  • you can still use anova.coxph to compare the fit to the fit of a null model and get a valid p-value that way
  • consider not dichotomizing your predictor ...
  • fit a regularized model, e.g. using the glmnet package with a ridge penalty (alpha = 0) and a small penalty

Easiest to see by plotting the data (using a Kaplan-Meier estimate):

library(ggfortify)
fit2 <- survfit(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
 Test_Dataset) 
autoplot(fit2)

enter image description here

All of the individuals in the "≤3.52" stratum die (fail) or are censored before the first individual in the other stratum dies ...

We can plot the fitted Cox model (with autoplot(survfit(fit))) too, although it's less obvious what's going on ...

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453