Pandas groupby/apply has different behaviour with int and string types

Question

I have the following dataframe

and two different functions that are very similar

def func1(x):
    if x.iloc[0]['X'] == 'A':
        x['D'] = 1
    else:
        x['D'] = 0
    return x[['X', 'D']]

def func2(x):
    if x.iloc[0]['X'] == 'A':
        x['D'] = 'u'
    else:
        x['D'] = 'v'
    return x[['X', 'D']]

Now I can groupby/apply these functions

df.groupby('X').apply(func1)
df.groupby('X').apply(func2)

The first line gives me what I want, i.e.

But the second line returns something quite strange

   X  D
0  A  u
1  A  u
2  A  u
3  A  u
4  A  u
5  A  u
6  A  u
7  A  u

So my questions are:

Can anybody explain why the behavior of groupby/apply is different when the type changes?
How can I get something similar with func2?

Possible duplicate of [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) — Erfan, Jul 09 '19 at 23:12
Don't really see the point of groupby here, seems you simply want: `df['D'] = np.where(df['X'].eq('A'), 'u', 'v')` — Erfan, Jul 09 '19 at 23:12
Thanks @Erfan you're right for this very example but this is not the questions I asked — Antoine Falck, Jul 09 '19 at 23:30
You second df columns X only contain A , that is why out put is u only — BENY, Jul 10 '19 at 00:42
@WeNYoBen Both outputs come from the same (given) input. Interested in an explanation as well. — Quang Hoang, Jul 10 '19 at 01:15

score 0 · Accepted Answer · answered Jul 10 '19 at 13:33

The problem is simply that a function applied to a GroupBy should never try to change the dataframe it receives. It is implementation dependant whether it is a copy (that can safely be changed but changes will not be seen in original dataframe) or a view. The choice is done by pandas optimizer, and as a user, you should just know that it is forbidden.

The correct way is to force a copy:

def func2(x):
    x = x.copy()
    if x.iloc[0]['X'] == 'A':
        x['D'] = 'u'
    else:
        x['D'] = 'v'
    return x[['X', 'D']]

After that, df.groupby('X').apply(func2).reset_index(level=0, drop=True) gives as expected:

   X  D
0  A  u
1  A  u
2  A  u
3  A  u
4  B  v
5  B  v
6  B  v
7  B  v

Pandas groupby/apply has different behaviour with int and string types

1 Answers1

Linked