Pandas groupby function using multiple columns

Question

This is similar to the following, however I wanted to take it one question further: pandas groupby apply on multiple columns to generate a new column

I have this dataframe:

    Group  Value  Part    Ratio
0     A    6373    10    0.637300
1     A    2512    10    0.251200
2     A    603     10    0.060300
3     A    512     10    0.051200
4     B    5200    20    0.472727
5     B    4800    20    0.436364
6     B    501     20    0.045545
7     B    499     20    0.045364

And this function that uses BOTH the 'Ratio' and 'Part' column that I'd like to apply to each 'Group':

def allocation(df, ratio, part):
    k = df[part].max()
    frac, results = np.array(np.modf(k * df[ratio]))
    remainder = int(k - results.sum())
    indices = np.argsort(frac)[::-1]
    results[indices[0:remainder]] += 1
    return results.astype(int)

Notice that the difference between my function and the function shown in the question I referred to at the top is that my function returns an array of values for the whole group instead of a single value. I tried the following:

data.groupby('Group', group_keys=False).apply(allocation, ratio='Ratio', part='Part')
Out[67]: 
Group
A    [6, 2, 1, 1]
B    [9, 9, 1, 1]
dtype: object

These numbers are correct. However, I need the output to be a series that I can assign back into the original dataframe, so that it would look something like this:

    Group  Value  Part    Ratio     Allocate
0     A    6373    10    0.637300     6
1     A    2512    10    0.251200     2
2     A    603     10    0.060300     1
3     A    512     10    0.051200     1
4     B    5200    20    0.472727     9
5     B    4800    20    0.436364     9
6     B    501     20    0.045545     1
7     B    499     20    0.045364     1

How would I go about doing this? Is using apply the correct approach?

score 1 · Answer 1 · answered Jul 12 '18 at 02:21

It usually happen when using apply with self-def function , we can fix it by using concatenate

s=df.groupby('Group', group_keys=False).apply(allocation, ratio='Ratio', part='Part').values
df['Allocate']=np.concatenate(s)
df
Out[71]: 
  Group  Value  Part     Ratio  Allocate
0     A   6373    10  0.637300         6
1     A   2512    10  0.251200         2
2     A    603    10  0.060300         1
3     A    512    10  0.051200         1
4     B   5200    20  0.472727         9
5     B   4800    20  0.436364         9
6     B    501    20  0.045545         1
7     B    499    20  0.045364         1

This way is not always correct. Since the result `s` will be ordered by the group key, which may differ from the order they appear in the original frame `df`. To see this, try to construct another frame using `df2 = pd.concat([df[4:], df[:4]])` and then do the same thing you did. You will get a wrong answer. — doraemon, Jul 12 '18 at 05:10

doraemon · Accepted Answer · 2018-07-12T05:14:39.993

To do it in pandas way, you can have the allocation function return a DataFrame or Series:

def allocation(df, ratio, part):
    k = df[part].max()
    frac, results = np.array(np.modf(k * df[ratio]))
    remainder = int(k - results.sum())
    indices = np.argsort(frac)[::-1]
    results[indices[0:remainder]] += 1
    df['Allocate'] = results.astype(int)
    return df

Then groupby.apply will directly give what you want

In [61]: df.groupby('Group', group_keys=False).apply(allocation, ratio='Ratio', part='Part')
Out[61]:
  Group  Value  Part   Ratio  Allocate
0     A   6373    10  0.6373         6
1     A   2512    10  0.2512         2
2     A    603    10  0.0603         1
3     A    512    10  0.0512         1
4     B   5200    20  0.4727         9
5     B   4800    20  0.4364         9
6     B    501    20  0.0455         1
7     B    499    20  0.0454         1

This works even if the original dataframe is not sorted by the Group. Try it on df2 = pd.concat([df.iloc[:2], df.iloc[6:], df.iloc[2:6]])

Pandas groupby function using multiple columns

2 Answers2