Groupby and count() on dataframe having two categorical variables

Question

I have a dataset, which I need to groupby() and find the count of each unique combination.

    body-style        make
0  convertible  alfa-romeo
1  convertible  alfa-romeo
2    hatchback  alfa-romeo
3        sedan        audi
4        sedan        audi

My need is to produce an output as shown below:

             make   body-style   count
0     alfa-romero  convertible       2
1     alfa-romero    hatchback       1
2            audi    sedan           2

Tried the below code:

body = pd.DataFrame({'make':['alfa-romeo','alfa-romeo','alfa-romeo','audi','audi'], 'body-style':['convertible','convertible','hatchback','sedan','sedan']})

body.groupby(by=['make','body-style'], as_index=False).count()

This aggregation throws up "list index out of range" error. However, when I remove either of the columns from groupby clause, it is able to give me counts grouped by the remaining column.

If I remove as_index=False, there is no error, but the resultant object will have both columns - make and body-style as part of the index and there won'nt be any count data.

I can add another column to the datframe, fill it with 1s and take a sum() instead of count() on the groupby. But would like to know if there is a cleaner way to do this.

Check my answer with this problem from [this](https://stackoverflow.com/a/32307259/2901002) — jezrael, May 22 '18 at 10:55
If you add both make and body style as grouper, there is nothing left to count. Count is a special method that tallies non null values in a column. What you want is probably `.size()` — ayhan, May 22 '18 at 10:55
This seems like a very good question to me. It has a reproducible example, op's attempt, clear definition of the problem yet it is downvoted. Is there something I missed? — ayhan, May 22 '18 at 10:58

MaxU - stand with Ukraine · Accepted Answer · 2018-05-22T10:57:35.123

11

you can use GroupBy.count() only if you have at least one column that hasn't been used for grouping.

In case you group by all columns in the DF - use .size() instead:

In [119]: body.groupby(['make','body-style']).size().reset_index(name='count')
Out[119]:
         make   body-style  count
0  alfa-romeo  convertible      2
1  alfa-romeo    hatchback      1
2        audi        sedan      2

edited May 22 '18 at 10:57

answered May 22 '18 at 10:53

MaxU - stand with Ukraine

205,989
36
386
419

Could you please clarify why your solution works and the code in the question doesn't? – Sasha Tsukanov May 22 '18 at 10:55
@SashaTsukanov, sure, added a short description... – MaxU - stand with Ukraine May 22 '18 at 10:57
Say if there were only 2 body style for every make, style-A and style-B. How can we get count of each of the type in the same column from the previous dataframe? The output will be: make body-style-A count body-style-B count alfa-romero 2 1 audi 1 1 – saha rudra Jul 20 '22 at 18:55

Groupby and count() on dataframe having two categorical variables

1 Answers1