9

I'm making some crosstabs with pandas:

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'dull', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

b     one   two       
c    dull  dull  shiny
a                     
bar     1     1      0
foo     2     1      2

But what I actually want is the following:

b     one        two       
c    dull  shiny dull  shiny
a                     
bar     1     0    1      0
foo     2     0    1      2

I found workaround by adding new column and set levels as new MultiIndex, but it seems to be difficult...

Is there any way to pass MultiIndex to crosstabs function to predefine output columns?

norecces
  • 207
  • 3
  • 8

2 Answers2

8

The crosstab function has a parameter called dropna which is set to True by default. This parameter defines whether empty columns (such as the one-shiny column) should be displayed or not.

I tried calling the funcion like this:

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna = False)

and this is what I got:

b     one          two       
c    dull  shiny  dull  shiny
a                            
bar     1      0     1      0
foo     2      0     1      2

Hope that was still helpful.

Pintas
  • 268
  • 3
  • 5
5

I don't think there is a way to do this, and crosstab calls pivot_table in the source, which doesn't seem to offer this either. I raised it as an issue here.

A hacky workaround (which may or may not be the same as you were already using...):

from itertools import product
ct = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
a_x_b = list(product(np.unique(b), np.unique(c)))
a_x_b = pd.MultiIndex.from_tuples(a_x_b)

In [15]: ct.reindex_axis(a_x_b, axis=1).fillna(0)
Out[15]:
      one          two
     dull  shiny  dull  shiny
a
bar     1      0     1      0
foo     2      0     1      2

If product is too slow, here is a numpy implementation of it.

buhtz
  • 10,774
  • 18
  • 76
  • 149
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • I included this example in [issue 3439](https://github.com/pydata/pandas/issues/3439) as I think one problem is, that the documentation describes something wrong. – bmu Jun 09 '13 at 08:33
  • I did mostly the same. I think my question is in approach that used in pandas. Crosstab function does what it is expected to do. So next question is: How can i groupby data by MultiIndex? By this i mean passing to groupby not only keys but keys and values. – norecces Jun 09 '13 at 10:18
  • @norecces the [implementation](https://github.com/pydata/pandas/blob/master/pandas/tools/pivot.py#L218) just uses pivot_table. (You should ask as a separate question, I think you can combine mapping and columns.) – Andy Hayden Jun 09 '13 at 10:37