Pandas Group By two columns and based on the value in one of them (categorical) write data into a specific column

Question

I have following dataframe:

df = pd.DataFrame([[1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3],['A','B','B','B','C','D','D','E','A','C','C','C','A','B','B','B','B','D','E'], [18,25,47,27,31,55,13,19,73,55,58,14,2,46,33,35,24,60,7]]).T
df.columns = ['Brand_ID','Category','Price']

   Brand_ID Category Price
0         1        A    18
1         1        B    25
2         1        B    47
3         1        B    27
4         1        C    31
5         1        D    55
6         1        D    13
7         1        E    19
8         2        A    73
9         2        C    55
10        2        C    58
11        2        C    14
12        3        A     2
13        3        B    46
14        3        B    33
15        3        B    35
16        3        B    24
17        3        D    60
18        3        E     7

What I need to do is to group by Brand_ID and category and count (similar to the first part of this question). However, I need instead to write the output into a different column depending on the category. So my Output should look like follows:

   Brand_ID  Category_A  Category_B  Category_C  Category_D  Category_E
0         1           1           3           1           2           1
1         2           1           0           3           0           0
2         3           1           4           0           1           1

Is there any possibility to do this directly with pandas?

Scott Boston · Accepted Answer · 2018-10-10T17:04:11.230

3

Try:

df.groupby(['Brand_ID','Category'])['Price'].count()\
  .unstack(fill_value=0)\
  .add_prefix('Category_')\
  .reset_index()\
  .rename_axis([None], axis=1)

Output

   Brand_ID  Category_A  Category_B  Category_C  Category_D  Category_E
0         1           1           3           1           2           1
1         2           1           0           3           0           0
2         3           1           4           0           1           1

OR

pd.crosstab(df.Brand_ID, df.Category)\
  .add_prefix('Category_')\
  .reset_index()\
  .rename_axis([None], axis=1)

edited Oct 10 '18 at 17:04

answered Oct 10 '18 at 17:00

Scott Boston

147,308
15
139
187

For whatever reason, this is ~2x faster than using `pivot_table`. – PMende Oct 10 '18 at 17:03
1

Thanks! The first answer is exactly what I'm looking for. I would replace ['Price'].count() by size() to make it more generic (for cases where empty values exist) – mshabeeb Oct 10 '18 at 20:39

score 3 · Answer 2 · answered Oct 10 '18 at 17:01

3

You're describing a pivot_table:

df.pivot_table(index='Brand_ID', columns='Category', aggfunc='size', fill_value=0)

Output:

Category  A  B  C  D  E
Brand_ID               
1         1  3  1  2  1
2         1  0  3  0  0
3         1  4  0  1  1

answered Oct 10 '18 at 17:01

PMende

5,171
2
19
26

Thanks! A reset_index() must be appended to get an answer similar to the one above. – mshabeeb Oct 10 '18 at 20:38

Pandas Group By two columns and based on the value in one of them (categorical) write data into a specific column

2 Answers2