GroupBy object in which entries can belong to several groups

Question

By the methods it supports, it looks like nothing speaks against labels of the original data frame/series occuring multiple times in a derived GroupBy object. Is it actually possible to, for example, construct a GroupBy object g from an iterable column like a in

>>> x
       a b
0 [0, 1] 1
1 [1, 2] 2

such that g will represent a GroupBy object with one entry for each of the entries in a's values? That is, I get results like

>>> x.iterable_groupby('a').size()
a
0 1
1 2
2 1
>>> x.iterable_groupby('a').mean()
    b
0 1.0
1 1.5
2 2.0

score 4 · Accepted Answer · edited May 23 '17 at 11:45

You should reshape your DataFrame to a tidy dataset. Reshaping part is asked frequently (1, 2, 3).

In a tidy dataset, each row should represent a single record. For that, you can create a 'grouper' column like this:

x['a'].apply(pd.Series).stack().reset_index(level=1, drop=True).to_frame('grouper')
Out: 
   grouper
0        0
0        1
1        1
1        2

If you join this with the original DataFrame, it can be grouped as you like:

x['a'].apply(pd.Series).stack().reset_index(level=1, drop=True).to_frame('grouper').join(x).groupby('grouper').mean()
Out: 
           b
grouper     
0        1.0
1        1.5
2        2.0

Reshaping part is not very efficient but as far as I know pandas does not offer a better method for that yet.

GroupBy object in which entries can belong to several groups

1 Answers1