Mean normalization different versions of code

Question

I want to mean normalize my data frame, when I implement the first version of code I am getting the normalized values, but when I implement version 2 I am getting an error called stop iteration. ["1B","2B","3B","HR","BB"] are columns in my data frame.

Version 1:

def meanNormalizeRates(df):
        subRates = df[["1B","2B","3B","HR","BB"]]
        df[["1B","2B","3B","HR","BB"]] = subRates - subRates.mean(axis=0)
        return df

stats = stats.groupby('yearID').apply(meanNormalizeRates)
stats.head()

Version 2:

 def mean(df):
    for val in ["1B","2B","3B","HR","BB"]:
          stats[val] = stats[val] -stats[val].mean(axis=0)

stats = stats.groupby('yearID').apply(mean)

stats.head()

I couldnt understand the difference between the two versions.

A good example

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9],
'gate' : [9, 7, 4,6, 9]}

frame = pd.DataFrame(data)
frame.head()

Version 1.1

def std(df):
    temp = df[['gate', 'pop']]
    df[['gate', 'pop']] = temp - temp.mean(axis=0)
    return df
frame.groupby('year').apply(std)

    gate    pop state   year
0   9   1.5 Ohio    2000
1   7   1.7 Ohio    2001
2   4   3.6 Ohio    2002
3   6   2.4 Nevada  2001
4   9   2.9 Nevada  2002

Version 1.2

def mean(df):
    for val in ['gate', 'pop']:
        df[val] = df[val]- df[val].mean(axis=0)

frame.groupby('year').apply(mean)

error: stop iteration

Can you try to include some [reproducible example data](http://stackoverflow.com/q/20109391/1222578)? It's a bit hard to tell why your results might differ without it. — Marius, Jan 19 '15 at 05:11

score 1 · Accepted Answer · answered Jan 19 '15 at 09:27

OK, so because you don't have a return statement in your mean() function (in example 1.2), that function just returns None for every group. The StopIteration error you get isn't all that clear, but what is happening is:

apply() calls your mean() function on every group.
Each of those calls returns None.
The results get put into a list, so here it's a list of all Nones
As part of trying to stitch the results back together, apply() tries to find the non-None values in the list, which throws a StopIteration exception.

So basically you can reproduce the error by doing:

eg_list = [None, None, None]
v = next(v for v in eg_list if v is not None)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-12-93b31b7a51e4> in <module>()
----> 1 v = next(v for v in eg_list if v is not None)

All that might be too much detail though- the takeaway is that when you're using apply(), you shouldn't really be doing all your changes within the function you're applying- you should be returning a result from the function and assigning them back to the dataframe, like:

# The lambda here will return the relevant values of gate and pop,
# and we just assign them wherever we want in the dataframe.
# Could be new columns, could be existing ones
frame[['gate', 'pop']] = frame.groupby('year')[['gate', 'pop']].apply(
    lambda group: group - group.mean(axis=0))

isnt frame.groupby('year')[['gate', 'pop']] and frame.groupby('year') are equal ?? — Elizabeth Susan Joseph, Jan 19 '15 at 11:05
I'm not sure how you can do things within the function, it's just not really how groupby and apply work. `frame.groupby('year')[['gate', 'pop']]` is almost the same as `frame.groupby('year')`, it just excludes the state column. — Marius, Jan 19 '15 at 11:15

Mean normalization different versions of code

1 Answers1