2

Several questions posted about transforming pandas groupby object to DataFrame seem to involve aggregation, as e.g. count() here.

Can a groupby object be converted to a DataFrame without aggregating, where the group names become level 0 of a MultiIndex? and can this process be iterated?

from pandas import DataFrame as DF

df = DF.from_dict({'a':1, 'b':2, 'c':3, 'd':4, 'e':5}, orient='index')

would like the output of the grouping:

df.groupby(lambda x: df[0][x]%2)

converted to this form:

DF.from_dict({0:{'b':2,'d':4},1:{'a':1,'c':3,'e':5}},orient='index').stack().to_frame()

enter image description here

(besides the point, why are values converted to floats?)

alancalvitti
  • 476
  • 3
  • 14

2 Answers2

3

Use pd.concat, it accepts a dictionary:

pd.concat({k: v for k, v in df.groupby(lambda x: df.loc[x, 0] % 2)})

     0
0 b  2
  d  4
1 a  1
  c  3
  e  5

Iterate over each group and build your dictionary. The dictionary can be constructed using a dictionary comprehension.


A slightly faster solution not involving a callable can be done with,

pd.concat({k: v for k, v in df.groupby(df.iloc[:,0] % 2)})

     0
0 b  2
  d  4
1 a  1
  c  3
  e  5

If you need, do this again and again, try a function,

def add_level(df, grouper):
    return pd.concat({k: v for k, v in df.groupby(by=grouper)})

r = add_level(df, df.iloc[:,0] % 3)
add_level(r, r.iloc[:, 0] % 2)

       0
0 1 d  4
  2 b  2
1 0 c  3
  1 a  1
  2 e  5
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks that works. I was going to further ask in the Q, how to iterate the stack. Eg, if ` df.loc[x, 0] % 3` gives level-0 index [0,1,2], would like to further group this index by %2 to get a 3-level MultiIndex, what's the appropriate lambda? – alancalvitti Jan 22 '19 at 20:09
  • @alancalvitti It seems like you can just repeat this process twice, once for %3 and the second for %2. – cs95 Jan 22 '19 at 20:11
  • But now `df.loc[x, 0]` won't work as the groupings are generated based on the index, not the values. – alancalvitti Jan 22 '19 at 20:13
2

Using assign chain with set_index

df.assign(indexlevel=np.arange(len(df))%2).\
    set_index('indexlevel',append=True).\
      swaplevel(0,1).\
       sort_index(level=0)
Out[30]: 
              0
indexlevel     
0          a  1
           c  3
           e  5
1          b  2
           d  4
BENY
  • 317,841
  • 20
  • 164
  • 234