2

I have the following dataframe

   X    Y
0  A   10
1  A    9
2  A    8
3  A    5
4  B  100
5  B   90
6  B   80
7  B   50

and two different functions that are very similar

def func1(x):
    if x.iloc[0]['X'] == 'A':
        x['D'] = 1
    else:
        x['D'] = 0
    return x[['X', 'D']]

def func2(x):
    if x.iloc[0]['X'] == 'A':
        x['D'] = 'u'
    else:
        x['D'] = 'v'
    return x[['X', 'D']]

Now I can groupby/apply these functions

df.groupby('X').apply(func1)
df.groupby('X').apply(func2)

The first line gives me what I want, i.e.

   X  D
0  A  1
1  A  1
2  A  1
3  A  1
4  B  0
5  B  0
6  B  0
7  B  0

But the second line returns something quite strange

   X  D
0  A  u
1  A  u
2  A  u
3  A  u
4  A  u
5  A  u
6  A  u
7  A  u

So my questions are:

  • Can anybody explain why the behavior of groupby/apply is different when the type changes?
  • How can I get something similar with func2?
  • Possible duplicate of [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – Erfan Jul 09 '19 at 23:12
  • Don't really see the point of groupby here, seems you simply want: `df['D'] = np.where(df['X'].eq('A'), 'u', 'v')` – Erfan Jul 09 '19 at 23:12
  • Thanks @Erfan you're right for this very example but this is not the questions I asked – Antoine Falck Jul 09 '19 at 23:30
  • You second df columns X only contain A , that is why out put is u only – BENY Jul 10 '19 at 00:42
  • 1
    @WeNYoBen Both outputs come from the same (given) input. Interested in an explanation as well. – Quang Hoang Jul 10 '19 at 01:15

1 Answers1

0

The problem is simply that a function applied to a GroupBy should never try to change the dataframe it receives. It is implementation dependant whether it is a copy (that can safely be changed but changes will not be seen in original dataframe) or a view. The choice is done by pandas optimizer, and as a user, you should just know that it is forbidden.

The correct way is to force a copy:

def func2(x):
    x = x.copy()
    if x.iloc[0]['X'] == 'A':
        x['D'] = 'u'
    else:
        x['D'] = 'v'
    return x[['X', 'D']]

After that, df.groupby('X').apply(func2).reset_index(level=0, drop=True) gives as expected:

   X  D
0  A  u
1  A  u
2  A  u
3  A  u
4  B  v
5  B  v
6  B  v
7  B  v
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252