0

I have a dataframe:

df = A  B. Text
     1. 2. 'hello and good morning'
     3. 4. 'I am watching TV'

I want to apply sentence classification pipeline:

text_feats = ['Text']
num_feats = ['A','B']
text_transformer = Pipeline(steps=[ 
   ('tfidf_vectorizer', TfidfVectorizer())])
numeric_transformer = Pipeline(steps=[
    ('scale', StandardScaler())])
preprocessor = ColumnTransformer(
    transformers=[
        ('text', text_transformer, text_feats),
        ('num', numeric_transformer, num_feats))
clf_pipe =  Pipeline(steps=[('preprocessor', preprocessor),
                   ("model", RandomForestClassifier())])
clf_pipe.fit(df, df[Y_COL])

But I get the following error in the TfIDF vectorizer and in CountVectorizer:

Found input variables with inconsistent numbers of samples: [1, 2]

Any idea what is the problem?

Cranjis
  • 1,590
  • 8
  • 31
  • 64
  • 1
    Might https://stackoverflow.com/questions/70550018/sklearn-custom-transformers-with-pipeline-all-the-input-array-dimensions-for-th/70550548#70550548 help? – amiola Nov 25 '22 at 09:09

0 Answers0