6

Is there any opportunity in pandas to groupby data by MultiIndex? By this i mean passing to groupby function not only keys but keys and values to predefine dataframe columns?

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'shiny', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)
df = pd.DataFrame([a, b, c]).T
df.columns = ['a', 'b', 'c']
df.groupby(['a', 'b', 'c']).apply(len)

a    b    c    
bar  one  dull     1
     two  dull     1
foo  one  dull     1
          shiny    1
     two  dull     1
          shiny    2

But what I actually want is the following:

mi = pd.MultiIndex(levels=[['foo', 'bar'], ['one', 'two'], ['dull', 'shiny']],
                   labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1]])
#pseudocode
df.groupby(['a', 'b', 'c'], multi_index = mi).apply(len)
a    b    c    
bar  one  dull     1
          shiny    0
     two  dull     1
          shiny    0
foo  one  dull     1
          shiny    1
     two  dull     1
          shiny    2

The way i see it is in creation of additional wrapper on groupby object. Or maybe this feature feets well to pandas philosophy and it can be included in the pandas lib?

norecces
  • 207
  • 3
  • 8

1 Answers1

7

just reindex and fillna!

In [14]: df.groupby(['a', 'b', 'c']).size().reindex(index=mi).fillna(0)
Out[14]: 
foo  one  dull     1
          shiny    1
     two  dull     1
          shiny    2
bar  one  dull     1
          shiny    0
     two  dull     1
          shiny    0
dtype: float64
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • I think what could be included is maybe a keyword ``dropna=False`` (which normally defaults to True) to included all combinations for a mi (which is what you have here)....this is similar to a new feature we are introducing in 0.11.1: http://pandas.pydata.org/pandas-docs/dev/groupby.html#filtration, which has this same property... – Jeff Jun 10 '13 at 15:58
  • thx, it would be great! My first question was about crosstab function - so you answered it too http://stackoverflow.com/questions/17003034/missing-data-in-pandas-crosstab . – norecces Jun 10 '13 at 16:08
  • In my pandas (version 0.11.1-dev) there is no dropna=False option in filter function. And as I understand from source code groupby function does not evaluate all possible combinations. So it is interesting for me to understand how you will include this option to groupby code. – norecces Jun 10 '13 at 16:22
  • try updating to master, was just added a few days ago – Jeff Jun 10 '13 at 16:23
  • You have answered my previous question not because of typing "reindex" but posting issue which tells me that by default without workaround by reindex such behaivor of crosstab and groupby is not possible in current version. – norecces Jun 10 '13 at 16:35
  • no....I was talking about providing a future keyword, that's what the issue is about; the solution I posted works now. I posted the filtration link to get an idea how a feature like this works (it is related, but not identical) to your issue. The reindexing is the correct method until/unless a specific feature is added – Jeff Jun 10 '13 at 16:42