I'm learning Python/Pandas with a DataFrame having the following structure:
import pandas as pd
df = pd.DataFrame({'key' : [111, 222, 333, 444, 555, 666, 777, 888, 999],
'score1' : [-1, 0, 2, -1, 7, 0, 15, 0, 1],
'score2' : [2, 2, -1, 10, 0, 5, -1, 1, 0]})
print(df)
key score1 score2
0 111 -1 2
1 222 0 2
2 333 2 -1
3 444 -1 10
4 555 7 0
5 666 0 5
6 777 15 -1
7 888 0 1
8 999 1 0
The possible values for the score1
and score2
Series are -1
and all positive integers (including 0
).
My goal is to normalize both columns the following way:
- If the value is equal to
-1
, then return a missingNaN
value - Else, normalize the remaining positive integers on a scale between
0
and1
.
I don't want to overwrite the original Series score1
and score2
. Instead, I would like to apply a function on both Series to create two new columns (say norm1
and norm2
).
I read several posts here that recommend to use the MinMaxScaler()
method from sklearn preprocessing module. I don't think this is what I need since I need an extra condition to take care of the -1
values.
What I think I need is a specific function that I can apply on both Series. I also familiarized myself with how normalization works but I'm having difficulties implementing this function in Python. Any additional help would be greatly appreciated.