The method apply
will apply a function to each element in the Series
(or in case of a DataFrame
either each row or each column depending on the chosen axis). Here you expect your function to process the entire Series
and to output a new Series
in its stead.
You can therefore simply run:
StandardScaler().fit_transform(df['Value'].values.reshape(-1, 1))
StandardScaler
excepts a 2D array as input where each row is a sample input that consists of one or more features. Even it is just a single feature (as seems to be the case in your example) it has to have the right dimensions. Therefore, before handing over your Series
to sklearn
I am accessing the values (the numpy
representation) and reshaping it accordingly.
For more details on reshape(-1, ...)
check this out: What does -1 mean in numpy reshape?
Now, the best bit. If your entire DataFrame
consists of a single column you could simply do:
StandardScaler().fit_transform(df)
And even if it doesn't, you could still avoid the reshape:
StandardScaler().fit_transform(df[['Value']])
Note how in this case 'Value'
is surrounded by 2 sets of braces so this time it is not a Series
but rather a DataFrame
with a subset of the original columns (in case you do not want to scale all of them). Since a DataFrame
is already 2-dimensional, you don't need to worry about reshaping.
Finally, if you want to scale just some of the columns and update your original DataFrame
all you have to do is:
>>> df = pd.DataFrame({'A': [1,2,3], 'B': [0,5,6], 'C': [7, 8, 9]})
>>> columns_to_scale = ['A', 'B']
>>> df[columns_to_scale] = StandardScaler().fit_transform(df[columns_to_scale])
>>> df
A B C
0 -1.224745 -1.397001 7
1 0.000000 0.508001 8
2 1.224745 0.889001 9