1

I have following dataframe:

df = pd.DataFrame([[1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3],['A','B','B','B','C','D','D','E','A','C','C','C','A','B','B','B','B','D','E'], [18,25,47,27,31,55,13,19,73,55,58,14,2,46,33,35,24,60,7]]).T
df.columns = ['Brand_ID','Category','Price']

   Brand_ID Category Price
0         1        A    18
1         1        B    25
2         1        B    47
3         1        B    27
4         1        C    31
5         1        D    55
6         1        D    13
7         1        E    19
8         2        A    73
9         2        C    55
10        2        C    58
11        2        C    14
12        3        A     2
13        3        B    46
14        3        B    33
15        3        B    35
16        3        B    24
17        3        D    60
18        3        E     7

What I need to do is to group by Brand_ID and category and count (similar to the first part of this question). However, I need instead to write the output into a different column depending on the category. So my Output should look like follows:

   Brand_ID  Category_A  Category_B  Category_C  Category_D  Category_E
0         1           1           3           1           2           1
1         2           1           0           3           0           0
2         3           1           4           0           1           1

Is there any possibility to do this directly with pandas?

mshabeeb
  • 577
  • 2
  • 9
  • 25

2 Answers2

3

Try:

df.groupby(['Brand_ID','Category'])['Price'].count()\
  .unstack(fill_value=0)\
  .add_prefix('Category_')\
  .reset_index()\
  .rename_axis([None], axis=1)

Output

   Brand_ID  Category_A  Category_B  Category_C  Category_D  Category_E
0         1           1           3           1           2           1
1         2           1           0           3           0           0
2         3           1           4           0           1           1

OR

pd.crosstab(df.Brand_ID, df.Category)\
  .add_prefix('Category_')\
  .reset_index()\
  .rename_axis([None], axis=1)
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • For whatever reason, this is ~2x faster than using `pivot_table`. – PMende Oct 10 '18 at 17:03
  • 1
    Thanks! The first answer is exactly what I'm looking for. I would replace ['Price'].count() by size() to make it more generic (for cases where empty values exist) – mshabeeb Oct 10 '18 at 20:39
3

You're describing a pivot_table:

df.pivot_table(index='Brand_ID', columns='Category', aggfunc='size', fill_value=0)

Output:

Category  A  B  C  D  E
Brand_ID               
1         1  3  1  2  1
2         1  0  3  0  0
3         1  4  0  1  1
PMende
  • 5,171
  • 2
  • 19
  • 26