0

I need to apply StandardScaler of sklearn to a single column col1 of a DataFrame:

df:

col1  col2  col3
1     0     A
1     10    C
2     1     A
3     20    B

This is how I did it:

from sklearn.preprocessing import StandardScaler

def listOfLists(lst):
    return [[el] for el in lst]

def flatten(t):
    return [item for sublist in t for item in sublist]

scaler = StandardScaler()

df['col1'] = flatten(scaler.fit_transform(listOfLists(df['col1'].to_numpy().tolist())))

However, then I apply the inverse_transform, then it does not give me initial values of col1. Instead it returns the normalised values:

scaler.inverse_transform(flatten(scaler.fit_transform(listOfLists(df['col1'].to_numpy().tolist()))))

or:

scaler.inverse_transform(df['col1'])
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Fluxy
  • 2,838
  • 6
  • 34
  • 63

1 Answers1

3

You could fit a scaler directly on the column (since the scaler is expecting a 2D array, you can select the column as a DataFrame by df[['col1']]):

scaler = StandardScaler()
>>> arr = scaler.fit_transform(df[['col1']]).flatten()
array([-0.90453403, -0.90453403,  0.30151134,  1.50755672])

>>> scaler.inverse_transform(arr)
array([1., 1., 2., 3.])
  • The first part works well and simplifies my approach, which is nice. But the inverse transform still gives me normalized values rather original values of `col1`. – Fluxy Mar 02 '22 at 19:17
  • @Fluxy maybe it's the version of scikit-learn? Because your code returns the correct initial data too. –  Mar 02 '22 at 19:18
  • I use the version 0.24.1 – Fluxy Mar 02 '22 at 19:19
  • @Fluxy could you try without flattening: `scaler.inverse_transform(scaler.fit_transform(df[['col1']]))` –  Mar 02 '22 at 19:19
  • 1
    yes, it works well without flatten. thanks. – Fluxy Mar 02 '22 at 19:24