I have this code working fine
df_amazon = pd.read_csv ("datasets/amazon_alexa.tsv", sep="\t")
X = df_amazon['variation'] # the features we want to analyze
ylabels = df_amazon['feedback'] # the labels, or answers, we want to test against
X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)
# Create pipeline using Bag of Words
pipe = Pipeline([('cleaner', predictors()),
('vectorizer', bow_vector),
('classifier', classifier)])
pipe.fit(X_train,y_train)
But if I try to add 1 more feature to the model, replacing
X = df_amazon['variation']
by
X = df_amazon[['variation','verified_reviews']]
I have this error message from Sklearn when I call fit
:
ValueError: Found input variables with inconsistent numbers of samples: [2, 2205]
So fit
works when X_train
and y_train
have the shapes
(2205,) and (2205,).
But not when the shapes are changed to (2205, 2) and (2205,).
What's the best way to deal with that?