2

I am trying to pass a series to a user defined function and getting this error:

Function:

def scale(series):
   sc=StandardScaler()
   sc.fit_transform(series)
   print(series)

Code for calling:

df['Value'].apply(scale) # df['Value'] is a Series having float dtype.

Error:

ValueError: Expected 2D array, got scalar array instead:
array=28.69.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Can anyone help address this issue?

Tanul
  • 43
  • 5

1 Answers1

2

The method apply will apply a function to each element in the Series (or in case of a DataFrame either each row or each column depending on the chosen axis). Here you expect your function to process the entire Series and to output a new Series in its stead.

You can therefore simply run:

StandardScaler().fit_transform(df['Value'].values.reshape(-1, 1))

StandardScaler excepts a 2D array as input where each row is a sample input that consists of one or more features. Even it is just a single feature (as seems to be the case in your example) it has to have the right dimensions. Therefore, before handing over your Series to sklearn I am accessing the values (the numpy representation) and reshaping it accordingly.

For more details on reshape(-1, ...) check this out: What does -1 mean in numpy reshape?

Now, the best bit. If your entire DataFrame consists of a single column you could simply do:

StandardScaler().fit_transform(df)

And even if it doesn't, you could still avoid the reshape:

StandardScaler().fit_transform(df[['Value']])

Note how in this case 'Value' is surrounded by 2 sets of braces so this time it is not a Series but rather a DataFrame with a subset of the original columns (in case you do not want to scale all of them). Since a DataFrame is already 2-dimensional, you don't need to worry about reshaping.

Finally, if you want to scale just some of the columns and update your original DataFrame all you have to do is:

>>> df = pd.DataFrame({'A': [1,2,3], 'B': [0,5,6], 'C': [7, 8, 9]})
>>> columns_to_scale = ['A', 'B']
>>> df[columns_to_scale] = StandardScaler().fit_transform(df[columns_to_scale])
>>> df
          A         B  C
0 -1.224745 -1.397001  7
1  0.000000  0.508001  8
2  1.224745  0.889001  9
rudolfovic
  • 3,163
  • 2
  • 14
  • 38