0

I'm trying to do a classification with a text feature and numerical feature.

I'll like to run CountVectorizer on my text, and passing the sparse matrix output with my numerical feature to my classifier, and then run a gridsearchcv.

This is my failed attempt at trying to setup a pipeline for gridsearchcv.

I've referenced this : How to access ColumnTransformer elements in GridSearchCV , but can't seem to get it to work.

Any help will be appreciated.

edit: got it to work with 'make column transformer'.

# X_train contains 2 columns, ['text','num']
# y_train contains 1 column, ['label']

word_transformer = Pipeline(steps = [('cvec',CountVectorizer())])

preprocessor = ColumnTransformer (transformers = [('wt',word_transformer,['text'])],
remainder = 'passthrough')

pipe = Pipeline(steps=[
    ('preprocessor',preprocessor),
    ('RBC', RandomForestClassifier())
])

pipe_params = {
    'preprocessor__wt__cvec__max_features': [1500,2000,3000],
    'RBC__max_features':['sqrt', 'log2']
}

gs = GridSearchCV(pipe, 
                  param_grid=pipe_params, 
                  cv=5) # 5-fold cross-validation.

gs.fit(X_train,y_train)
  • What's going wrong? – Ben Reiniger Sep 12 '21 at 17:13
  • It seems you have solved your own question (as per the edit you have posted). Voting to close the question. – Akshay Sehgal Sep 13 '21 at 00:19
  • @Ben Reiniger I'll add my error message after work. Not sure if it's a syntax error. But it seemed to be solved with 'make column transformer' :https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html – Chee Yuan Ng Sep 13 '21 at 00:19

0 Answers0