1

By the methods it supports, it looks like nothing speaks against labels of the original data frame/series occuring multiple times in a derived GroupBy object. Is it actually possible to, for example, construct a GroupBy object g from an iterable column like a in

>>> x
       a b
0 [0, 1] 1
1 [1, 2] 2

such that g will represent a GroupBy object with one entry for each of the entries in a's values? That is, I get results like

>>> x.iterable_groupby('a').size()
a
0 1
1 2
2 1
>>> x.iterable_groupby('a').mean()
    b
0 1.0
1 1.5
2 2.0
ayhan
  • 70,170
  • 20
  • 182
  • 203
Anaphory
  • 6,045
  • 4
  • 37
  • 68

1 Answers1

4

You should reshape your DataFrame to a tidy dataset. Reshaping part is asked frequently (1, 2, 3).

In a tidy dataset, each row should represent a single record. For that, you can create a 'grouper' column like this:

x['a'].apply(pd.Series).stack().reset_index(level=1, drop=True).to_frame('grouper')
Out: 
   grouper
0        0
0        1
1        1
1        2

If you join this with the original DataFrame, it can be grouped as you like:

x['a'].apply(pd.Series).stack().reset_index(level=1, drop=True).to_frame('grouper').join(x).groupby('grouper').mean()
Out: 
           b
grouper     
0        1.0
1        1.5
2        2.0

Reshaping part is not very efficient but as far as I know pandas does not offer a better method for that yet.

Community
  • 1
  • 1
ayhan
  • 70,170
  • 20
  • 182
  • 203