I'm trying to do a classification with a text feature and numerical feature.
I'll like to run CountVectorizer on my text, and passing the sparse matrix output with my numerical feature to my classifier, and then run a gridsearchcv.
This is my failed attempt at trying to setup a pipeline for gridsearchcv.
I've referenced this : How to access ColumnTransformer elements in GridSearchCV , but can't seem to get it to work.
Any help will be appreciated.
edit: got it to work with 'make column transformer'.
# X_train contains 2 columns, ['text','num']
# y_train contains 1 column, ['label']
word_transformer = Pipeline(steps = [('cvec',CountVectorizer())])
preprocessor = ColumnTransformer (transformers = [('wt',word_transformer,['text'])],
remainder = 'passthrough')
pipe = Pipeline(steps=[
('preprocessor',preprocessor),
('RBC', RandomForestClassifier())
])
pipe_params = {
'preprocessor__wt__cvec__max_features': [1500,2000,3000],
'RBC__max_features':['sqrt', 'log2']
}
gs = GridSearchCV(pipe,
param_grid=pipe_params,
cv=5) # 5-fold cross-validation.
gs.fit(X_train,y_train)