1

I read this post about feature scaling: all-about-feature-scaling

The two main feature scaling techniques are:

  1. min-max scaler - which responds well for features with distributions which are not Gaussian.

  2. Standard scaler - which responds well for features with Gaussian distributions.

I read other posts and examples, and it seems that we always use one scaling method (min-max or standard) for all the features.

I haven't seen example or paper that suggests:

1. go over all the features, and for each feature:
1.1 check feature distribution
1.2 if the feature distribution is Gaussian:
1.2.1 use Standard scaler for this feature
1.3 otherwise:
1.3.1 use min-max scaler for this feature
  1. Why we are not mixing the scaling methods ?

  2. What is wrong or disadvantages with my proposal ?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
user3668129
  • 4,318
  • 6
  • 45
  • 87
  • Apart from being rather off-topic here (not a *programming* question), I think you will get much more reliable answers in [Cross Validated](https://stats.stackexchange.com/help/on-topic), where I suggest you migrate this. – desertnaut May 01 '20 at 01:21

1 Answers1

3

Then, your features will have different scales, which is a problem because the features with the larger scale will dominate the rest (e.g., in KNN). The features with min-max normalization will be rescaled into a [0,1] range, while the ones with standardization will be transformed into a negative to positive range (e.g., [-2,+2] or even wider in the event of small standard deviations).

import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler

dfTest = pd.DataFrame({'A':[14,90,80,90,70],
                       'B':[10,107,110,114,113]})

scaler = MinMaxScaler()
dfTest['A'] = scaler.fit_transform(dfTest[['A']])

scaler = StandardScaler()
dfTest['B'] = scaler.fit_transform(dfTest[['B']])

ax = dfTest.plot.scatter('A', 'B')
ax.set_aspect('equal')

enter image description here

Reveille
  • 4,359
  • 3
  • 23
  • 46