Sklearn Voting ensemble with models using different features and testing with k-fold cross validation

Question

I have a data frame with 4 different groups of features.

I need to create 4 different models with these four different feature groups and combine them with the ensemble voting classifier. Furthermore, I need to test the classifier using k-fold cross validation.

However, I am finding it difficult to combine different feature sets, voting classifier and k-fold cross validation with functionality available in sklearn. Following is the code that I have so far.

y = df1.index
x = preprocessing.scale(df1)

SVM = svm.SVC(kernel='rbf', C=1)
rf=RandomForestClassifier(n_estimators=200)
ann = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(25, 2), random_state=1)
neigh = KNeighborsClassifier(n_neighbors=10)

models = list()
models.append(('facial', SVM))
models.append(('posture', rf))
models.append(('computer', ann))
models.append(('physio', neigh))

ens = VotingClassifier(estimators=models)

cv = KFold(n_splits=10, random_state=None, shuffle=True)
scores = cross_val_score(ens, x, y, cv=cv, scoring='accuracy')

As you can see, this program uses same features for all 4 models. How can I improve this program to achieve my objective?

This works fine, but my objective is to use different groups of features for each model. Here all models use all the features available in my dataset. — Chamila Wijayarathna, May 28 '20 at 13:55
This might be helpful https://stackoverflow.com/questions/45074579/votingclassifier-different-feature-sets — Parthasarathy Subburaj, May 28 '20 at 14:10
I already referred this, however, answers posted their do not use k-fold cross validation — Chamila Wijayarathna, May 28 '20 at 14:13
You need to append a column selection before each estimator. See [the example here](https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#use-columntransformer-by-selecting-column-by-names). So your final `VotingClassifier` will have list of pipelines (one for each column selector and estimator). Try and implement this approach. If still not able to solve, I will post an answer. — Vivek Kumar, May 28 '20 at 16:12
I managed to get the cross validation part, but I am not sure how to create the pipeline with ColumnTransform, I tried ColumnSelector in 'mlxtend', but getting type error saying 'argument of type 'ColumnSelector' is not iterable'. https://gist.github.com/cdwijayarathna/5425919a39dea2f8e9d8bf79c02d544d — Chamila Wijayarathna, May 28 '20 at 16:42
@VivekKumar I updated the code to follow the example you provided, https://gist.github.com/cdwijayarathna/3dd073cf3ab99b9e757b82e701f67525, However, I am still getting "TypeError: argument of type 'ColumnTransformer' is not iterable', what am I missing here? — Chamila Wijayarathna, May 28 '20 at 17:15
I did managed to get it to work, https://stackoverflow.com/questions/62079006/sklearn-pipeline-argument-of-type-columntransformer-is-not-iterable/62079963#62079963 — Chamila Wijayarathna, May 29 '20 at 06:32

score 0 · Answer 1 · answered May 29 '20 at 06:35

I did manage to achieve this using Pipelines,

y = df1.index
x = preprocessing.scale(df1)

phy_features = ['A', 'B', 'C']
phy_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
phy_processer = ColumnTransformer(transformers=[('phy', phy_transformer, phy_features)])

fa_features = ['D', 'E', 'F']
fa_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
fa_processer = ColumnTransformer(transformers=[('fa', fa_transformer, fa_features)])


pipe_phy = Pipeline(steps=[('preprocessor', phy_processer ),('classifier', SVM)])
pipe_fa = Pipeline(steps=[('preprocessor', fa_processer ),('classifier', SVM)])

ens = VotingClassifier(estimators=[pipe_phy, pipe_fa])

cv = KFold(n_splits=10, random_state=None, shuffle=True)
for train_index, test_index in cv.split(x):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]
    ens.fit(x_train,y_train)
    print(ens.score(x_test, y_test))

Please refer sklearn Pipeline: argument of type 'ColumnTransformer' is not iterable for if you are receiving an TypeError when using ColumnTransforms.

Sklearn Voting ensemble with models using different features and testing with k-fold cross validation

1 Answers1

Linked