1

I know this solution How to make a pandas crosstab with percentages?, but the solution proposed does not work with three-way tables.

Consider the following table:

df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                   'B' : ['A', 'B', 'C'] * 8,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4})




pd.crosstab(df.A,[df.B,df.C],colnames=['topgroup','bottomgroup'])
Out[89]: 
topgroup      A       B       C    
bottomgroup bar foo bar foo bar foo
A                                  
one           2   2   2   2   2   2
three         2   0   0   2   2   0
two           0   2   2   0   0   2

Here, I would like to get the row percentage, within each topgroup (A, B and C).

Using apply(lambda x: x/sum(),axis=1) will fail because percentages have to sum to 1 within each group.

Any ideas?

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

1 Answers1

2

If I understand your question, it seems that you could write:

>>> table = pd.crosstab(df.A,[df.B,df.C], colnames=['topgroup','bottomgroup'])
>>> table / table.sum(axis=1, level=0)

topgroup       A         B         C     
bottomgroup  bar  foo  bar  foo  bar  foo
A                                        
one          0.5  0.5  0.5  0.5  0.5  0.5
three        1.0  0.0  0.0  1.0  1.0  0.0
two          0.0  1.0  1.0  0.0  0.0  1.0
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • oh man this is pure pandas magic. it now seems so obvious once I see it. thanks. I think your idea will be pretty useful for others. `level=0` is because my table is multiindexed by the columns, right? – ℕʘʘḆḽḘ Apr 05 '16 at 14:38
  • 1
    Thanks EdChum. @Noobie: that's right, `axis=1` say we want to apply the operation along each row and if you have a multiindex, you can pass a `level` argument to apply that method over a particular level of that multiindex. – Alex Riley Apr 05 '16 at 14:40