3

I am just getting started in Python and Machine Learning and have encountered an issue which I haven't been able to fix myself or with any other online resource. I am trying to scale a column in a pandas dataframe using a lambda function in the following way:

X['col1'] = X['col1'].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

and get the following error message:

'float' object has no attribute 'min'

I have tried to convert the data type into integer and the following error is returned:

'int' object has no attribute 'min'

I believe I am getting something pretty basic wrong, hope anyone can point me in the right direction.

Rob
  • 45,296
  • 24
  • 122
  • 150
  • 1
    You can give a try to `MinMaxScaler` under `Scikit-Learn`. Check https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html – meW Jan 08 '19 at 07:30

2 Answers2

3

I think apply here is not necessary, because exist faster vectorized solution - change x to column X['col1']:

X = pd.DataFrame({'col1': [100,10,1,20,10,-20,200]})
X['col2'] = (X['col1'] - X['col1'].min()) / (X['col1'].max() - X['col1'].min())
print (X)

   col1      col2
0   100  0.545455
1    10  0.136364
2     1  0.095455
3    20  0.181818
4    10  0.136364
5   -20  0.000000
6   200  1.000000

Like @meW pointed in comments another solution is use MinMaxScaler:

from sklearn import preprocessing

min_max_scaler = preprocessing.MinMaxScaler()
X['col2'] = min_max_scaler.fit_transform(X[['col1']])
print (X)

   col1      col2
0   100  0.545455
1    10  0.136364
2     1  0.095455
3    20  0.181818
4    10  0.136364
5   -20  0.000000
6   200  1.000000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Just tried it and it works thanks. In fact I was trying to implement the lambda function to verify that I had used the MinMaxScaler correctly. Results match up – Dario Raffaele Jan 08 '19 at 07:52
0

Check below code !

@ if condition is required for .apply(--if--else,axis=0/1) !

@ else use assign method, it will also give same result !

df=pd.DataFrame({'salary':[10,29,76,54,32]})

df.apply(lambda x: ((x-x.min())/(x.max()-x.min())) if x.name == 'salary' else x, axis=0)

df.assign(salary=lambda x: ((x['salary']-x['salary'].min() )/(x['salary'].max()-x['salary'].min()) ))
Ran A
  • 746
  • 3
  • 7
  • 19