A standard scaler is usually used to fit a normal distribution with the data, and then calculate the Z-scores. This thus means that first the mean μ and standard deviation σ of the data are calculated, and then the Z-scores are calculated with z = (x - μ) / σ.
By setting with_mean
or with_std
to False
, we respectively set the mean μ to 0
and the standard deviation σ to 1. If both are set to False
, we thus calculate the Z-score of a standard normal distribution [wiki].
The main use case of setting with_mean
to False
is processing sparse matrices. Sparse matrices contain a significant amount of zeros, and are therefore stored in a way that the zeros usually use no (or very little) memory. If we would fit the mean, and then calculate the z-score, it is almost certain that all zeros will be mapped to non-zero values, and thus use (significant amounts of) memory. For large sparse matrices, that can result in a memory error: the data is that large, that the memory is not able to store the matrix anymore. By setting μ=0, this means that values that are zero, will map on zero. The result of the standard scaler is a sparse matrix with the same shape.