pandas groupby returns extra index

Question

dft = pd.DataFrame({'C1': ['A','A','B','B'], 
                    'C2': [1,2,3,4]}) 

def lam3(df):
    return pd.DataFrame({'X': ['C','D','E'], 
                    'Y': [11,22,33]})

Given the above dataframe and function (that I cannot change),I'd like to run groupby+apply so each group returns a dataframe, like this

    C1  C2  X   Y
0   A   1   C   11
1   A   1   D   22
2   A   1   E   33
3   A   2   C   11
4   A   2   D   22
5   A   2   E   33
6   B   3   C   11
7   B   3   D   22
8   B   3   E   33
9   B   4   C   11
10  B   4   D   22
11  B   4   E   33

Doing below gives an extra column of numered index:

dft.groupby(['C1','C2']).apply(lam3)

So I have to do the following to get what I want:

dft.groupby(['C1','C2']).apply(lam3).reset_index().drop(columns='level_2')

Apparently, this is not generic since level_2 depends on how many columns I use in the groupby and blindly dropping columns starting with "level" can potentially remove original columns.

How to use a lambda function that returns a dataframe in groupby without returning the extra index?

The question is similar to this, but each group here returns a dataframe, instead of a series.

EDIT: lam3 here is just an example function for demonstration. In the real version, there can be operations that depends on df. The point is that the lam3 function returns a dataframe in the context of groupby. So cross-join would not help

lam3 here is just an example function for demonstration. In the real version, there are operations that depends on df. The point is that the lam3 function returns a dataframe, in the context of groupby. — iwbabn, Jun 08 '19 at 17:26

Quang Hoang · Accepted Answer · 2019-06-08T18:34:34.207

4

reset_index let you drop the index by order with an option to drop. So you can try:

dft.groupby(['C1','C2']).apply(lam3).reset_index(level=-1, drop=True)

Output:

       X   Y
C1 C2       
A  1   C  11
   1   D  22
   1   E  33
   2   C  11
   2   D  22
   2   E  33
B  3   C  11
   3   D  22
   3   E  33
   4   C  11
   4   D  22
   4   E  33

edited Jun 08 '19 at 18:34

answered Jun 08 '19 at 17:34

Quang Hoang

146,074
10
56
74

score 3 · Answer 2 · answered Nov 21 '19 at 12:51

3

use group_keys=False in your groupby

dft.groupby(['C1','C2'], group_keys=False).apply(lam3)

answered Nov 21 '19 at 12:51

Steve Alexander

3,327
1
10
6

pandas groupby returns extra index

2 Answers2