0

Context

I am currently preprocessing my dataset for Machine Learning purposes. Now, I would like to normalise all numeric columns. I found a few solutions but none of them really mimics the behaviour I prefer.

My goal is to have normalised a column in the following way with the lowest value being converted to 0 and the highest to 1:


Code

     column                  column_normalised
1    10                      0
2    30            ->        1
2    20                      0.5

Question

  • How can I achieve this goal?
  • Would you also normalise numerically-encoded categorial features or leave them as it?
cs95
  • 379,657
  • 97
  • 704
  • 746
christophriepe
  • 1,157
  • 12
  • 47

1 Answers1

1

NumPy's interp might answer your first question..

df["column_normalised"] = np.interp(x=df["column"],
                                    xp=(df["column"].min(), df["column"].max()),
                                    fp=(0, 1))

Output :

print(df)

   column  column_normalised
1      10                0.0
2      30                1.0
2      20                0.5
Timeless
  • 22,580
  • 4
  • 12
  • 30