Missing data in pandas.crosstab

Question

I'm making some crosstabs with pandas:

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'dull', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

b     one   two       
c    dull  dull  shiny
a                     
bar     1     1      0
foo     2     1      2

But what I actually want is the following:

b     one        two       
c    dull  shiny dull  shiny
a                     
bar     1     0    1      0
foo     2     0    1      2

I found workaround by adding new column and set levels as new MultiIndex, but it seems to be difficult...

Is there any way to pass MultiIndex to crosstabs function to predefine output columns?

"seems to be difficult" is such a glorious understatement :) — Andy Hayden, Jun 09 '13 at 18:41

score 8 · Answer 1 · answered Jan 14 '14 at 10:18

The crosstab function has a parameter called dropna which is set to True by default. This parameter defines whether empty columns (such as the one-shiny column) should be displayed or not.

I tried calling the funcion like this:

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna = False)

and this is what I got:

b     one          two       
c    dull  shiny  dull  shiny
a                            
bar     1      0     1      0
foo     2      0     1      2

Hope that was still helpful.

score 5 · Accepted Answer · edited Jan 20 '23 at 10:18

5

I don't think there is a way to do this, and crosstab calls pivot_table in the source, which doesn't seem to offer this either. I raised it as an issue here.

A hacky workaround (which may or may not be the same as you were already using...):

from itertools import product
ct = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
a_x_b = list(product(np.unique(b), np.unique(c)))
a_x_b = pd.MultiIndex.from_tuples(a_x_b)

In [15]: ct.reindex_axis(a_x_b, axis=1).fillna(0)
Out[15]:
      one          two
     dull  shiny  dull  shiny
a
bar     1      0     1      0
foo     2      0     1      2

If product is too slow, here is a numpy implementation of it.

edited Jan 20 '23 at 10:18

buhtz

10,774
18
76
149

answered Jun 08 '13 at 20:31

Andy Hayden

359,921
101
625
535

I included this example in [issue 3439](https://github.com/pydata/pandas/issues/3439) as I think one problem is, that the documentation describes something wrong. – bmu Jun 09 '13 at 08:33
I did mostly the same. I think my question is in approach that used in pandas. Crosstab function does what it is expected to do. So next question is: How can i groupby data by MultiIndex? By this i mean passing to groupby not only keys but keys and values. – norecces Jun 09 '13 at 10:18
@norecces the [implementation](https://github.com/pydata/pandas/blob/master/pandas/tools/pivot.py#L218) just uses pivot_table. (You should ask as a separate question, I think you can combine mapping and columns.) – Andy Hayden Jun 09 '13 at 10:37

Missing data in pandas.crosstab

2 Answers2

Linked