Python dataframe data cleanup- normalize range of columns

Question

I have a df with 2171 columns and 200+ rows. I want to normalize a range of those cols.

[Input df]

Time           '340.0'   '341.0'   '342.0'   'Mode'
11:30:15 PM    0.25       0.35      0.65      light
11:31:15 PM    0.22       0.30      0.62      auto
11:32:15 PM    0.32       0.39      0.98      auto
.
.
.

[Code if only used on one col, I am not sure how to apply to a range of cols]

sr_df['340.0'] = sr_df['340.0'].apply(lambda x: (x - x.mean()) / (x.std()))

I am very new to python and I am not sure why it is giving me the following error:

AttributeError: 'float' object has no attribute 'mean'

Checkout [applymap](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.applymap.html) — it's-yer-boy-chet, Jul 23 '18 at 19:30
`sr_df['340.0'] = sr_df['340.0'].apply(lambda x: (x - sr_df['340.0'].mean()) / (sr_df['340.0'].std()))` — jujuBee, Jul 23 '18 at 19:30
@jujuBee in that case, how would I do it for a range of columns? Should I use a loop? I have ~2000 columns I need normalized. — Brain_overflowed, Jul 23 '18 at 19:32
@chet-the-wizard gives me- AttributeError: 'Series' object has no attribute 'applymap' — Brain_overflowed, Jul 23 '18 at 19:32
@Brain_overflowed you apply it to the DataFrame not the column. — it's-yer-boy-chet, Jul 23 '18 at 19:33

sacuL · Answer 1 · 2018-07-23T20:03:02.303

You can apply your normalization on all desired columns at once:

sr_df[['340.0', '341.0', '342.0']].apply(lambda x: ((x-x.mean()) / (x.std())))

sr_df[['340.0', '341.0', '342.0']]= sr_df[['340.0', '341.0', '342.0']].apply(lambda x: ((x-x.mean()) / (x.std())))

>>> sr_df
          Time     340.0     341.0     342.0   Mode
0  11:30:15 PM -0.259828  0.073922 -0.500626  light
1  11:31:15 PM -0.844441 -1.034910 -0.650814   auto
2  11:32:15 PM  1.104269  0.960988  1.151440   auto

Better yet, you can apply it to all numeric columns (if that's what you're going for):

# Get a list of numeric columns:
cols = list(sr_df.select_dtypes(include=[np.number]).columns.values)

sr_df[cols] = sr_df[cols].apply(lambda x: ((x-x.mean()) / (x.std())))

Fixing your code:

If you want to fix your code, you can apply your function on a column of your dataframe (rather than applying it on the series). The reason it doesn't work on a series is outlined in this answer by @BrenBarn:

When you use apply on a series, your function is called on each element. When you use apply on a DataFrame, your function is called on each column.

So the way you're doing it, you're trying to get the mean and std of a float, and floats do not have such attributes, leading to your error: AttributeError: 'float' object has no attribute 'mean'

# this works:
sr_df[['340.0']].apply(lambda x: (x - x.mean()) / (x.std()))

# This doesn't:
# sr_df['340.0'].apply(lambda x: (x - x.mean()) / (x.std()))

# The difference is this:
>>> type(sr_df['340.0'])
<class 'pandas.core.series.Series'>
>>> type(sr_df[['340.0']])
<class 'pandas.core.frame.DataFrame'>

score 0 · Answer 2 · answered Jul 23 '18 at 20:49

0

You may also use the MinMaxScaler from Sklean. It will automatically fit/scale all values between 0 and 1. See this example and this one.

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

columns = ['301', '341', '342'] 
df[columns] = scaler.fit_transform(df[columns])

answered Jul 23 '18 at 20:49

Matthias

5,574
8
61
121

That is a different scaling method than the one OP requested. Closer to the desired method is `StandardScaler` from `Sklearn` (which is the same as OP requested, except it uses the `numpy` standard for the `std` rather than the `pandas` standard, which means it uses 1 degree of freedom to calculated standard deviation rather than 0, giving slightly different results) – sacuL Jul 23 '18 at 23:17
True. Thanks for comment. – Matthias Jul 23 '18 at 23:55

Python dataframe data cleanup- normalize range of columns

2 Answers2

Fixing your code: