3

I have dataframe as below

   Wash_Month  Wash_Day
0           3         2
1           4         3

And the expected out put is

#d={'Wash_Month':'Wash_Month/Wash_Day','Wash_Day':'Wash_Month/Wash_Day'}

#df.T.astype(str).groupby(d).agg(','.join)
Out[329]: 
                       0    1
Wash_Month/Wash_Day  3,2  4,3

As you saw , I first do the transpose T.

If we groupby with axis=1 and remove the T, I expected the same out put.

df.astype(str).groupby(d,axis=1).agg(','.join)
Out[330]: 
   Wash_Month/Wash_Day
0  Wash_Month,Wash_Day
1  Wash_Month,Wash_Day

The out put is mismatched with expected output . Is there specific problem onagg with join with groupby of axis=1

Since other agg function like sum work as normal

df.astype(str).groupby({'Wash_Month':'Wash_Month/Wash_Day','Wash_Day':'Wash_Month/Wash_Day'}, axis=1).sum()
Out[332]: 
   Wash_Month/Wash_Day
0                 32.0 # str 3 + str 2
1                 43.0

About why the result become float rather than a str check link

Appreciate your help :-)

BENY
  • 317,841
  • 20
  • 164
  • 234

1 Answers1

3

Here is a hint:

def f(x):
    print(x)
    print(type(x))
    return 1

df.astype(str).groupby(d,axis=1).agg(f)

Output:

  Wash_Month Wash_Day
0          3        2
1          4        3
<class 'pandas.core.frame.DataFrame'>

Note the output is a dataframe.

As opposed to:

def f(x):
    print(x)
    print(type(x))
    return 1

df.T.astype(str).groupby(d).agg(f)

Output:

Wash_Month    3
Wash_Day      2
Name: 0, dtype: object
<class 'pandas.core.series.Series'>
Wash_Month    4
Wash_Day      3
Name: 1, dtype: object
<class 'pandas.core.series.Series'>

Which f gets called with each series, hence 'join' is concatenating the column headers.

I can't explain it with digging through the source code, but it appears that the groupby along with astype(str) is causing agg to act differently in each situation.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187