Feature (Covariates) selection in CoxPHFitter, Lifelines Survival Analysis

Question

i am using this implemented model in Python for the purpose of survival analysis:

from lifelines import CoxPHFitter

Unfortunately i am not able(i do not know how) to loop over all covariates (features) to run the regression individualy for the purpose of feature selection and save their result. I am trying the script below:

`def fit_and_score_features2(X):
    y=X[["Status","duration_yrs"]]
    X.drop(["duration_yrs", "Status"], axis=1, inplace=True)
    n_features = X.shape[1]
    scores = np.empty(n_features)
    m = CoxPHFitter()

    for j in range(n_features):
       Xj = X.values[:, j:j+1]
       Xj=pd.merge(X, y,  how='right', left_index=True, right_index=True)
       m.fit(Xj, duration_col="duration_yrs", event_col="Status", show_progress=True)
       scores[j] = m._score_
    return scores`

Unfortunately it return me this error:

ValueError Traceback (most recent call last) in () 1 #Trying the function above ----> 2 scores = fit_and_score_features2(sample) 3 pd.Series(scores, index=features.columns).sort_values(ascending=False)

in fit_and_score_features2(X) 15 Xj=pd.merge(X, y, how='right', left_index=True, right_index=True) 16 m.fit(Xj, duration_col="duration_yrs", event_col="Status", show_progress=True) ---> 17 scores[j] = m.score 18 return scores

ValueError: setting an array element with a sequence.

Thank you in advance.

Why are you using `_score_` - that's a hidden variable, and it does not represent any kind of accuracy performance? `score_` however is a measure of accuracy. — Cam.Davidson.Pilon, Nov 25 '18 at 15:58
Oh, yes you are right, but it still does not work properly. The algorithm doesn't save individual values for each variable. Return of function: X1 0.523545 X2 0.523545 X3 0.523545 X4 0.52354 — Antonio Dichev, Nov 25 '18 at 16:16

Antonio Dichev · Answer 1 · 2018-11-25T16:52:16.780

I think that i was able to debug with your help (@Cam.Davidson.Pilon). Thanks a lot. It is the proper script in my opinion:

`def fit_and_score_features2(X):
   y=X[["Status","duration_yrs"]]
   X.drop(["duration_yrs", "Status"], axis=1, inplace=True)
   n_features = X.shape[1]
   scores = np.empty(n_features)
   m = CoxPHFitter()

   for j in range(n_features):
       Xj = X.iloc[:, j:j+1]
       Xj=pd.merge(Xj, y,  how='right', left_index=True, right_index=True)
       m.fit(Xj, duration_col="duration_yrs", event_col="Status", show_progress=True)
       scores[j] = m.score_
   return scores`

score 0 · Answer 2 · answered Jul 06 '22 at 15:38

0

For lifeline version 0.27.0 replace m.score_ with m.score(Xj) if you want to know the log likelihood and m.score(Xj,scoring_method='concordance_index') if you want to know the concordance index.

answered Jul 06 '22 at 15:38

Gisel Hernandez Chavez

1

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 07 '22 at 03:57

Feature (Covariates) selection in CoxPHFitter, Lifelines Survival Analysis

2 Answers2