I have a pipeline running preprocessing and then a Random Survival Forest from the SciKit-Survival package. I am trying to use Scikit-Survival's as_concordance_index_ipcw_scorer() class found here.
My pipeline looks like the following:
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('num',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler',
StandardScaler())]),
Index(['IntVar1', 'IntVar2', 'IntVar3',
'IntVar4'],
dtype='object')),
('cat',
Pipeline(steps=[('imputer',
SimpleImputer(fill_value='missing',
strategy='constant')),
('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse=False))]),
Index(['CharVar1', 'CharVar2', 'CharVar3'], dtype='object'))])),
('randomsurvivalforest',
RandomSurvivalForest(max_features='sqrt',
min_samples_leaf=0.005,
min_samples_split=0.01, n_estimators=150,
n_jobs=-1, oob_score=True,
random_state=200))])
This is the python code leading up to the pipeline and the fitting of the pipeline:
print("Importing global DF")
print("Creating X & Y set")
X = df.iloc[:,:-2].copy()
y = Surv.from_dataframe("AliveStatus","Target_Age",df.iloc[:,-2:].copy()) ## Creates structured array for Scikit Surv
print("Defining feature categories by data type")
numerical_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns
print("Splitting dataset")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5) #SkLearn splitter
print("Defining preprocessing steps using SciKitLearn pipeline...")
## Pipeline Steps
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(sparse=False,handle_unknown='ignore'))]) ## Use "sparse=False" because Random Forests cannot take Spare Matrixes, only Dense Matrixes.
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)])
## Pipeline defining
print("Defining Random Survival Forest pipeline from SciKit Survival")
rsf = make_pipeline(
preprocessor,
RandomSurvivalForest(n_estimators=150, ## Default 100
min_samples_split=0.01, ## Default 6
min_samples_leaf=0.005, ## Default 3
max_features="sqrt", ## Defaults to none when not defined
n_jobs=-1, ## Default -1
oob_score = True,
random_state=200) ## Random State 2020
)
##Fitting & Scoring
print("Fitting dataframe to RSF Pipeline")
rsf.fit(X_train,y_train)
print("Fitting completed.")
After the fitting is completed I try to run the following:
as_concordance_index_ipcw_scorer(rsf).score(X_test,y_test)
I get the following error after:
AttributeError Traceback (most recent call last)
<ipython-input-97-9a92b22d8026> in <module>
----> 1 as_concordance_index_ipcw_scorer(rsf).score(X_test,y_test)
C:\ProgramData\Anaconda3\lib\site-packages\sksurv\metrics.py in score(self, X, y)
788 score : float
789 """
--> 790 estimate = self._do_predict(X)
791 score = self._score_func(
792 survival_train=self._train_y,
C:\ProgramData\Anaconda3\lib\site-packages\sksurv\metrics.py in _do_predict(self, X)
768
769 def _do_predict(self, X):
--> 770 predict_func = getattr(self.estimator_, self._predict_func)
771 return predict_func(X)
772
AttributeError: 'as_concordance_index_ipcw_scorer' object has no attribute 'estimator_'
An option I've tried was specifying the RSF section of the pipeline without any success:
as_concordance_index_ipcw_scorer(rsf[1]).score(X_test,y_test)
Any suggestions?
Apologies for length or missing information, I'm new to pipelines & ScikitSurvival and wanted to give as much detail as I see.
Thanks