dft = pd.DataFrame({'C1': ['A','A','B','B'],
'C2': [1,2,3,4]})
def lam3(df):
return pd.DataFrame({'X': ['C','D','E'],
'Y': [11,22,33]})
Given the above dataframe and function (that I cannot change),I'd like to run groupby+apply so each group returns a dataframe, like this
C1 C2 X Y
0 A 1 C 11
1 A 1 D 22
2 A 1 E 33
3 A 2 C 11
4 A 2 D 22
5 A 2 E 33
6 B 3 C 11
7 B 3 D 22
8 B 3 E 33
9 B 4 C 11
10 B 4 D 22
11 B 4 E 33
Doing below gives an extra column of numered index:
dft.groupby(['C1','C2']).apply(lam3)
So I have to do the following to get what I want:
dft.groupby(['C1','C2']).apply(lam3).reset_index().drop(columns='level_2')
Apparently, this is not generic since level_2 depends on how many columns I use in the groupby and blindly dropping columns starting with "level" can potentially remove original columns.
How to use a lambda function that returns a dataframe in groupby without returning the extra index?
The question is similar to this, but each group here returns a dataframe, instead of a series.
EDIT: lam3 here is just an example function for demonstration. In the real version, there can be operations that depends on df. The point is that the lam3 function returns a dataframe in the context of groupby. So cross-join would not help