I am using sparse matrices to train the logistic regression estimator using OnevsRestClassifier.. The feature set is quite large (~1.6million).
When the classifier has to predict, it raises an exception saying number of features in test data and train data are not equal.
I fail to understand how it can expect the number of features to be equal when it comes to sparse matrix representation. For instance, here is the snippet of my rudimentary code:
classifier = OneVsRestClassifier(LogisticRegression())
classifier = classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
Here the shape of X_train and X_test are obviously different.
print X_train.shape
(11, 1617899)
print X_test.shape
(3, 83715)
So an exception is raised:
ValueError: X has 83715 features per sample; expecting 1617899
(Little source code probing says me that linear_model/base.py does this comparison in decision_function())
How can I fix this?