How to avoid numeric error when normalizing min-max near zero?

Question

I am using

from sklearn import preprocessing

v01 = preprocessing.minmax_scale(v01, feature_range=(rf_imp_vec_truncated.min(), rf_imp_vec_truncated.max()))

and it usually works, except for some times when I get errors like

    preprocessing.minmax_scale(v01, feature_range=(rf_imp_vec_truncated.min(), rf_imp_vec_truncated.max()))
  File "C:\Code\EPMD\Kodex\EPD_Prerequisite\python_3.7.6\Lib\site-packages\sklearn\preprocessing\_data.py", line 510, in minmax_scale
    X = s.fit_transform(X)
  File "C:\Code\EPMD\Kodex\EPD_Prerequisite\python_3.7.6\Lib\site-packages\sklearn\base.py", line 571, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "C:\Code\EPMD\Kodex\EPD_Prerequisite\python_3.7.6\Lib\site-packages\sklearn\preprocessing\_data.py", line 339, in fit
    return self.partial_fit(X, y)
  File "C:\Code\EPMD\Kodex\EPD_Prerequisite\python_3.7.6\Lib\site-packages\sklearn\preprocessing\_data.py", line 365, in partial_fit
    " than maximum. Got %s." % str(feature_range))
ValueError: Minimum of desired feature range must be smaller than maximum. Got (-6.090366306515144e-15, -6.090366306515144e-15).

This looks like a numeric error, and I would like to see a flat line in this case.

How to get around this without too much code uglification?

score 0 · Answer 1 · answered May 12 '21 at 12:44

0

Are you sure you're interpreting the meaning of feature_range correctly? The docs mention, it is the range in which you want the output data, say [0, 1].

The docs also state that the feature_index[0] (i.e., the minimum) must be strictly less than feature_index[1] (i.e., the maximum). However, in your case both are equal (-6.09e-15 and -6.09e-15), hence the error.

answered May 12 '21 at 12:44

Nikhil Kumar

1,015
1
9
14

Adding a guard `if rf_imp_vec_truncated.min() >= rf_imp_vec_truncated.max(): return` doesn't enter the `if` statement. This smells like numeric errors of subtracting two floats. – Gulzar May 13 '21 at 11:52

score 0 · Answer 2 · answered May 13 '21 at 12:23

The cleanest solutuion I could find for this was to add epsilon to the max:

v01 = preprocessing.minmax_scale(v01, feature_range=(rf_imp_vec_truncated.min(), rf_imp_vec_truncated.max() + np.finfo(rf_imp_vec_truncated.dtype).eps))

Now they are no longer equal.

How to avoid numeric error when normalizing min-max near zero?

2 Answers2