0

I write own normalized module, because I seem sklearn don't normalize all data together (only per column or row). And I have two codes.

First code with sklearn.

from sklearn import preprocessing

data = np.array([[-1], [-0.5], [0], [1], [2], [6], [10], [18]])
print(data)

scaler = preprocessing.MinMaxScaler(feature_range=(5, 10))
print(scaler.fit_transform(data))
print(scaler.inverse_transform(scaler.fit_transform(data)))

Result:

[[-1. ]
 [-0.5]
 [ 0. ]
 [ 1. ]
 [ 2. ]
 [ 6. ]
 [10. ]
 [18. ]]
[[ 5.        ]
 [ 5.13157895]
 [ 5.26315789]
 [ 5.52631579]
 [ 5.78947368]
 [ 6.84210526]
 [ 7.89473684]
 [10.        ]]
[[-1. ]
 [-0.5]
 [ 0. ]
 [ 1. ]
 [ 2. ]
 [ 6. ]
 [10. ]
 [18. ]]

And with my module:

data = np.array([[-1, 2], [-0.5, 6], [0, 10], [1, 18]])
print(data)

scaler = scl.Scaler(feature_range=(5, 10))
print(scaler.transform(data))
print(scaler.inverse_transform(scaler.transform(data)))

Result:

[[-1.   2. ]
 [-0.5  6. ]
 [ 0.  10. ]
 [ 1.  18. ]]
[[ 5.          5.78947368]
 [ 5.13157895  6.84210526]
 [ 5.26315789  7.89473684]
 [ 5.52631579 10.        ]]
[[-1.00000000e+00  2.00000000e+00]
 [-5.00000000e-01  6.00000000e+00]
 [ 1.33226763e-15  1.00000000e+01]
 [ 1.00000000e+00  1.80000000e+01]]

I guess 1.33226763e-15 don't suit for me.

I think it occur because there is floating point. Although sklearn don't have this problem.

Please tell me where do I do mistake?

import numpy as np


class Scaler:
    def __init__(self, feature_range: tuple = (0, 1)):
        self.scaler_min = feature_range[0]
        self.scaler_max = feature_range[1]

        self.data_min = None
        self.data_max = None

    def transform(self, x: np.ndarray):
        self.data_min = x.min(initial=0)
        self.data_max = x.max(initial=0)

        scaled_data = (x - x.min(initial=0)) / (x.max(initial=0) - x.min(initial=0))
        return scaled_data * (self.scaler_max - self.scaler_min) + self.scaler_min

    def inverse_transform(self, x: np.ndarray):
        scaled_data = (x - self.scaler_min) / (self.scaler_max - self.scaler_min)
        return scaled_data * (self.data_max - self.data_min) + self.data_min
Pro
  • 673
  • 2
  • 6
  • 15
  • What is it that you are concerned about here? `1.33226763e-15` is zero, and if you take the time to format it properly (like with `%f`), then it will print as 0. – Tim Roberts Nov 13 '21 at 03:55
  • @TimRoberts I guess `1.33226763e-15` is not `0`. `1.33226763e-15` is `0.000...133226763`. Perhaps I'm wrong. Correct me please. – Pro Nov 13 '21 at 04:00
  • Given the scale of your input data and the number of significant digits, it is 0. Floating point numbers are approximations. There are always rounding errors. That's fine in computations, but because humans get confused by that (as you are), you need to handle it when you print the data. – Tim Roberts Nov 13 '21 at 04:05
  • @TimRoberts Ok, Do you want to say `sklearn` format data before output? – Pro Nov 13 '21 at 04:09
  • Rather than trying to redo what `sklearn` does already, is there a reason you can't record the shape of your dataset (`S = data.shape`), flatten your data (`flat = data.flatten()`), normalize it (`normalized = scaler.inverse_transform(scaler.fit_transform(flat))`, and then reshape it (`new_normalized = normalized.reshape(S)`)? – ramzeek Nov 13 '21 at 04:42
  • You are not actually separating your data from its tuplets. Consider using `.flatten()` – Larry the Llama Nov 13 '21 at 05:30
  • @wikikikitiki You are right, I asked community how I can do it better and I get [advice](https://stackoverflow.com/a/69939745/8497844) to create own module. Can you give me full answer here? I would accept it. – Pro Nov 13 '21 at 05:54
  • 1
    `data.flatten()` don't suit me because `scaler.inverse_transform(scaler.fit_transform(flat))` give an error. But `data.reshape(-1, 1)` suit me. Thank you both. – Pro Nov 13 '21 at 07:48

0 Answers0