I am trying to construct a pipeline with a StandardScaler() and LogisticRegression(). I get different results when I code it with and without the pipeline. Here's my code without the pipeline:
clf_LR = linear_model.LogisticRegression()
scalar = StandardScaler()
X_train_std = scalar.fit_transform(X_train)
X_test_std = scalar.fit_transform(X_test)
clf_LR.fit(X_train_std, y_train)
print('Testing score without pipeline: ', clf_LR.score(X_test_std, y_test))
My code with pipeline:
pipe_LR = Pipeline([('scaler', StandardScaler()),
('classifier', linear_model.LogisticRegression())
])
pipe_LR.fit(X_train, y_train)
print('Testing score with pipeline: ', pipe_LR.score(X_test, y_test))
Here is my result:
Testing score with pipeline: 0.821917808219178
Testing score without pipeline: 0.8767123287671232
While trying to debug the problem, it seems the data is being standardized. But the result with pipeline matches the result of training the model on my original X_train data (without applying StandardScaler()).
clf_LR_orig = linear_model.LogisticRegression()
clf_LR_orig.fit(X_train, y_train)
print('Testing score without Standardization: ', clf_LR_orig.score(X_test, y_test))
Testing score without Standardization: 0.821917808219178
Is there something I am missing in the construction of the pipeline? Thanks very much!