I have a dataset of 5 features. Two of these features are very similar but do not have the same min and max values.
... | feature 2 | feature 3 | ...
--------------------------------
..., 208.429993, 206.619995, ...
..., 207.779999, 205.050003, ...
..., 206.029999, 203.410004, ...
..., 204.429993, 202.600006, ...
..., 206.429993, 204.25, ...
feature 3
is always smaller than feature 2
and it is important that it stays that way after scaling. But since feature 2 and features 3 do not have the exact same min
and max
values, after scaling they will both end up having 0 and 1 as min and max by default. This will remove the relationship between the values. In fact after scaling, the first sample becomes:
... | feature 2 | feature 3 | ...
--------------------------------
..., 0.00268, 0.00279, ...
This is something that I do not want. I cannot seem to find a way to manually change the min and max values of MinMaxScaler
. There are other ugly hacks such as manipulating the data and combining feature2 and feature 3 into one for the scaling and splitting again afterward. But I would like to know first if there is a solution that is handled by sklearn
, such as using the same min and max for multiple features.
Otherwise, the simplest workaround would do.