I have looked at similar questions as such as this one. But none of the mentioned solutions worked in my case.
I am trying to build a text classification prediction model.
def train_model(classifier, feature_vector_train, label, feature_vector_valid, is_neural_net=False):
# fit the training dataset on the classifier
classifier.fit(feature_vector_train, label)
# predict the labels on validation dataset
predictions = classifier.predict(feature_vector_valid)
if is_neural_net:
predictions = predictions.argmax(axis=-1)
return metrics.accuracy_score(predictions, train_label)
# Naive Bayes on Word Level TF IDF Vectors
accuracy = train_model(naive_bayes.MultinomialNB(),train_text,train_label,test_text)
print ("NB, WordLevel TF-IDF: ", accuracy)
However, Naive_bayes returns the below error:
ValueError: Found input variables with inconsistent numbers of samples: [500, 3100]
my training data
print(train_text.shape)
type(train_text)
returns
(3100, 3013)
scipy.sparse.csr.csr_matrix
my training labels
print(train_label.shape)
type(train_label)
returns
(3100,)
numpy.ndarray
my test dataset
print(test_text.shape)
type(test_text)
returns
(500, 3013)
scipy.sparse.csr.csr_matrix
I tried every possible type of transformation. Can any one recommend a solution? thanks