I am using sklearn's DictVectorizer
to construct a large, sparse feature matrix, which is fed to an ElasticNet
model. Elastic net (and similar linear models) work best when predictors (columns in the feature matrix) are centered and scaled. The recommended approach is to build a Pipeline
that uses a StandardScaler
prior to the regressor, however that doesn't work with sparse features, as stated in the docs.
I thought to use the normalize=True
flag in ElasticNet
which seems to support sparse data, however it's not clear whether the normalization is applied during prediction to the test data as well. Does anyone know if normalize=True
applies for prediction as well? If not, is there a way to use the same standardization on the training and test set when dealing with sparse features?