Pivot Tables of Counts in Pandas DataFrame

Question

I have a pandas dataframe:

  Col X    Col Y
class 1    cat 1
class 2    cat 1
class 3    cat 2
class 2    cat 3

that I want to transform into:

         cat 1  cat 2  cat 3
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

where the values are value counts. How do I do it?

Zero · Accepted Answer · 2017-09-09T05:13:26.940

95

Here are couple of ways to reshape your data df

In [27]: df
Out[27]:
     Col X  Col Y
0  class 1  cat 1
1  class 2  cat 1
2  class 3  cat 2
3  class 2  cat 3

1) Using pd.crosstab()

In [28]: pd.crosstab(df['Col X'], df['Col Y'])
Out[28]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

2) Or, use groupby on 'Col X','Col Y' with unstack over Col Y, then fill NaNs with zeros.

In [29]: df.groupby(['Col X','Col Y']).size().unstack('Col Y', fill_value=0)
Out[29]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

3) Or, use pd.pivot_table() with index=Col X, columns=Col Y

In [30]: pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)
Out[30]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

4) Or, use set_index with unstack

In [492]: df.assign(v=1).set_index(['Col X', 'Col Y'])['v'].unstack(fill_value=0)
Out[492]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

edited Sep 09 '17 at 05:13

answered Jun 06 '15 at 06:05

Zero

74,117
18
147
154

1

Thanks John - that was incredibly helpful, especially providing different possibilities! I didn't even think of the cross tab possibility. – SteelyDanish Jun 06 '15 at 20:28
Thanks for the comparison of all three. I default to groupby, and often see pivot_table used. – Waylon Walker May 04 '17 at 15:33
2

came across this because I was trying to figure out the difference between groupby and pivot_table and when to use which. Your answer was certainly helpful. Do you know of somehow easily comprehensible information on the different concepts? cheers – Fabian Bosler Sep 17 '17 at 21:34
1

Warning: the last method (`set_index` and `unstack`) does not generally work: it fails when there is a duplicate line in the original data. – Eric O. Lebigot Jun 14 '18 at 10:16

score 1 · Answer 2 · answered Mar 24 '23 at 22:47

1

Since pandas 1.1.0, value_counts() can be called on a dataframe. So another way is to count each pairs of Col X-Col Y values and unstack the counts.

table = df[['Col X', 'Col Y']].value_counts().rename_axis([None, None]).unstack(fill_value=0)

answered Mar 24 '23 at 22:47

cottontail

10,268
18
50
51

Pivot Tables of Counts in Pandas DataFrame

2 Answers2

Linked

Related