2

I have a df with 2171 columns and 200+ rows. I want to normalize a range of those cols.

[Input df]

Time           '340.0'   '341.0'   '342.0'   'Mode'
11:30:15 PM    0.25       0.35      0.65      light
11:31:15 PM    0.22       0.30      0.62      auto
11:32:15 PM    0.32       0.39      0.98      auto
.
.
.

[Code if only used on one col, I am not sure how to apply to a range of cols]

sr_df['340.0'] = sr_df['340.0'].apply(lambda x: (x - x.mean()) / (x.std()))

I am very new to python and I am not sure why it is giving me the following error:

AttributeError: 'float' object has no attribute 'mean'
Brain_overflowed
  • 426
  • 1
  • 5
  • 21

2 Answers2

0

You can apply your normalization on all desired columns at once:

sr_df[['340.0', '341.0', '342.0']].apply(lambda x: ((x-x.mean()) / (x.std())))

sr_df[['340.0', '341.0', '342.0']]= sr_df[['340.0', '341.0', '342.0']].apply(lambda x: ((x-x.mean()) / (x.std())))

>>> sr_df
          Time     340.0     341.0     342.0   Mode
0  11:30:15 PM -0.259828  0.073922 -0.500626  light
1  11:31:15 PM -0.844441 -1.034910 -0.650814   auto
2  11:32:15 PM  1.104269  0.960988  1.151440   auto

Better yet, you can apply it to all numeric columns (if that's what you're going for):

# Get a list of numeric columns:
cols = list(sr_df.select_dtypes(include=[np.number]).columns.values)

sr_df[cols] = sr_df[cols].apply(lambda x: ((x-x.mean()) / (x.std())))

Fixing your code:

If you want to fix your code, you can apply your function on a column of your dataframe (rather than applying it on the series). The reason it doesn't work on a series is outlined in this answer by @BrenBarn:

When you use apply on a series, your function is called on each element. When you use apply on a DataFrame, your function is called on each column.

So the way you're doing it, you're trying to get the mean and std of a float, and floats do not have such attributes, leading to your error: AttributeError: 'float' object has no attribute 'mean'

# this works:
sr_df[['340.0']].apply(lambda x: (x - x.mean()) / (x.std()))

# This doesn't:
# sr_df['340.0'].apply(lambda x: (x - x.mean()) / (x.std()))

# The difference is this:
>>> type(sr_df['340.0'])
<class 'pandas.core.series.Series'>
>>> type(sr_df[['340.0']])
<class 'pandas.core.frame.DataFrame'>
sacuL
  • 49,704
  • 8
  • 81
  • 106
0

You may also use the MinMaxScaler from Sklean. It will automatically fit/scale all values between 0 and 1. See this example and this one.

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

columns = ['301', '341', '342'] 
df[columns] = scaler.fit_transform(df[columns])
Matthias
  • 5,574
  • 8
  • 61
  • 121
  • That is a different scaling method than the one OP requested. Closer to the desired method is `StandardScaler` from `Sklearn` (which is the same as OP requested, except it uses the `numpy` standard for the `std` rather than the `pandas` standard, which means it uses 1 degree of freedom to calculated standard deviation rather than 0, giving slightly different results) – sacuL Jul 23 '18 at 23:17
  • True. Thanks for comment. – Matthias Jul 23 '18 at 23:55