I have a dataframe:
df = A B. Text
1. 2. 'hello and good morning'
3. 4. 'I am watching TV'
I want to apply sentence classification pipeline:
text_feats = ['Text']
num_feats = ['A','B']
text_transformer = Pipeline(steps=[
('tfidf_vectorizer', TfidfVectorizer())])
numeric_transformer = Pipeline(steps=[
('scale', StandardScaler())])
preprocessor = ColumnTransformer(
transformers=[
('text', text_transformer, text_feats),
('num', numeric_transformer, num_feats))
clf_pipe = Pipeline(steps=[('preprocessor', preprocessor),
("model", RandomForestClassifier())])
clf_pipe.fit(df, df[Y_COL])
But I get the following error in the TfIDF vectorizer and in CountVectorizer:
Found input variables with inconsistent numbers of samples: [1, 2]
Any idea what is the problem?