Pandas dataframe reshaping with label as axis

Question

Supposed I have a set of data with two labels, put in a pandas Dataframe:

    label1 label2
0        0      a
1        1      a
2        1      a
3        1      a
4        1      a
5        2      b
6        0      b
7        1      b
8        2      b
9        0      b
10       2      c
11       1      c
12       2      c
13       0      c
14       2      c

Using the following code, the number of elements for each combination of labels can be obtained:

grouped = df.groupby(['label1', 'label2'], sort = False)
grouped.size()

The result is something like this:

label1  label2
0       a         1
1       a         4
2       b         2
0       b         2
1       b         1
2       c         3
1       c         1
0       c         1
dtype: int64

However, I would also like to compare the distribution of data count for label 2 in each label 1 group. I imagine the most convenient way to further manipulate the data for this purpose would be having a Dataframe (or some sort of table) with label1/2 as rows/columns and content as data count, like this:

    a   b   c
0   1   2   1
1   4   1   1
2   0   2   3

After a while of search, to my surprise, there seems no easy way to do this kind of dataframe reshaping in pandas.

Using a loop is possible. But I assumed it would be super slow, since in the real data, there are hundreds of thousands of different labels.

Moreover, there seems no way to get a group from only label1, after grouping with both label1 and label2, so the loop will have to be on the combination of labels, which might make things even slower and more complicated.

Anyone knows a smart way to do this?

Thank you very much guys! Once I learnt that this type of operation is called pivot, it became obvious that there are actually many simple ways to this. So I guess my real question was I didn't know the keyword... — cclkevin, Jun 17 '18 at 10:24

score 0 · Answer 1 · answered Jun 17 '18 at 00:57

0

Are you looking for pd.pivot_table?

df.pivot_table(index='label1', columns='label2', aggfunc='size').fillna(0)

answered Jun 17 '18 at 00:57

stefanw

10,456
3
36
34

score 0 · Accepted Answer · answered Jun 17 '18 at 01:12

0

Probably crosstab:

pd.crosstab(df.label1, df.label2)

answered Jun 17 '18 at 01:12

jxc

13,553
4
16
34

Pandas dataframe reshaping with label as axis

2 Answers2