2
dft = pd.DataFrame({'C1': ['A','A','B','B'], 
                    'C2': [1,2,3,4]}) 

def lam3(df):
    return pd.DataFrame({'X': ['C','D','E'], 
                    'Y': [11,22,33]})

Given the above dataframe and function (that I cannot change),I'd like to run groupby+apply so each group returns a dataframe, like this

    C1  C2  X   Y
0   A   1   C   11
1   A   1   D   22
2   A   1   E   33
3   A   2   C   11
4   A   2   D   22
5   A   2   E   33
6   B   3   C   11
7   B   3   D   22
8   B   3   E   33
9   B   4   C   11
10  B   4   D   22
11  B   4   E   33

Doing below gives an extra column of numered index:

dft.groupby(['C1','C2']).apply(lam3)

So I have to do the following to get what I want:

dft.groupby(['C1','C2']).apply(lam3).reset_index().drop(columns='level_2')

Apparently, this is not generic since level_2 depends on how many columns I use in the groupby and blindly dropping columns starting with "level" can potentially remove original columns.

How to use a lambda function that returns a dataframe in groupby without returning the extra index?

The question is similar to this, but each group here returns a dataframe, instead of a series.

EDIT: lam3 here is just an example function for demonstration. In the real version, there can be operations that depends on df. The point is that the lam3 function returns a dataframe in the context of groupby. So cross-join would not help

iwbabn
  • 1,275
  • 4
  • 17
  • 32
  • This so called cross join – BENY Jun 08 '19 at 17:23
  • lam3 here is just an example function for demonstration. In the real version, there are operations that depends on df. The point is that the lam3 function returns a dataframe, in the context of groupby. – iwbabn Jun 08 '19 at 17:26
  • https://github.com/pandas-dev/pandas/issues/22546 – BENY Jun 08 '19 at 19:40

2 Answers2

4

reset_index let you drop the index by order with an option to drop. So you can try:

dft.groupby(['C1','C2']).apply(lam3).reset_index(level=-1, drop=True) 

Output:

       X   Y
C1 C2       
A  1   C  11
   1   D  22
   1   E  33
   2   C  11
   2   D  22
   2   E  33
B  3   C  11
   3   D  22
   3   E  33
   4   C  11
   4   D  22
   4   E  33
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
3

use group_keys=False in your groupby

dft.groupby(['C1','C2'], group_keys=False).apply(lam3)

Steve Alexander
  • 3,327
  • 1
  • 10
  • 6