2

I am using lifelines package to do Cox Regression. After trying to fit the model, I checked the CPH assumptions for any possible violations and it returned some problematic variables, along with the suggested solutions.

One of the solution that I would like to try is the one suggested here: https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Introduce-time-varying-covariates

However, the example written here is using CoxTimeVaryingFitter which, unlike CoxPHFitter, does not have concordance score, which will help me gauge the model performance. Additionally, CoxTimeVaryingFitter does not have check assumption feature. Does this mean that by putting it into episodic format, all the assumptions are automatically satisfied?

Alternatively, after reading a SAS textbook on survival analysis, it seemed like their solution is to create the interaction term directly (multiplying the problematic variable with the survival time) without changing the format to episodic format (as shown in the link). This way, I was hoping to just keep using CoxPHFitter due to its model scoring capability.

However, after doing this alternative, when I call check_assumptions again on the model with the time-interaction variable, the CPH assumption on the time-interaction variable is violated.

Now I am torn between:

  1. Using CoxTimeVaryingFitter without knowing what the model performance is (seems like a bad idea)
  2. Using CoxPHFitter, but the assumption is violated on the time-interaction variable (which inherently does not seem to fix the problem)

Any help regarding to solve this confusion is greatly appreciated

gosok
  • 23
  • 3
  • I'll see if I can help: lifelines support for time-varying data is limited atm (on the roadmap however). Can I ask if you are interested more in prediction (sounds like it), or inference? – Cam.Davidson.Pilon Mar 16 '20 at 20:35
  • Thank you for your help! I guess it is possible to reframe the problem into a simpler numerical prediction, but if possible, I would like to explore this method due to its flexibility in providing future forecast at any given timestep. Since I am trying to provide a solution for something that has pretty uncertain supply chain timing, I figure having this flexibility helps. – gosok Mar 16 '20 at 21:53
  • This would not seem to be on-topic under the rules laid out for this forum. Voting to migrate to CrossValidated.com – IRTFM Mar 16 '20 at 23:13
  • Did you find how to evaluate the model performance? (for CoxTimeVaryingFitter) – Luis Leal Dec 19 '20 at 07:52

1 Answers1

1

Here is one suggestion:

  1. If you choose the CoxTimeVaryingFitter, then you need to somehow evaluate the quality of your model. Here is one way. Use the regression coefficients B and write down your model. I'll write it as S(t;x;B), where S is an estimator of the survival, t is the time, and x is a vector of covariates (age, wage, education, etc.). Now, for every individual i, you have a vector of covariates x_i. Thus, you have the survival function for each individual. Consequently, you can predict which individual will 'fail' first, which 'second', and so on. This produces a (predicted) ranking of survival. However, you know the real ranking of survival since you know the failure times or times-to-event. Now, quantify how many pairs (predicted survival, true survival) share the same ranking. In essence, you would be estimating the concordance.
  2. If you opt to use CoxPHFitter, I don't think it was meant to be used with time-varying covariates. Instead, you could use two other approaches. One is to stratify your variable, i.e., cph.fit(dataframe, time_column, event_column, strata=['your variable to stratify']). The downside is that you no longer obtain a hazard ratio for that variable. The other approach is to use splines. Both of these methods are explained in here.
JR2
  • 11
  • 2