pandas - Counting occurrences of a value in a DataFrame per each unique value in another column

Question

Supposing that I have a DataFrame along the lines of:

    term      score
0   this          0
1   that          1
2   the other     3
3   something     2
4   anything      1
5   the other     2
6   that          2
7   this          0
8   something     1

How would I go about counting up the instances in the score column by unique values in the term column? Producing a result like:

    term      score 0     score 1     score 2     score 3
0   this            2           0           0           0
1   that            0           1           1           0
2   the other       0           0           1           1
3   something       0           1           1           0
4   anything        0           1           0           0

Related questions I've read here include Python Pandas counting and summing specific conditions and COUNTIF in pandas python over multiple columns with multiple conditions, but neither seems to quite be what I'm looking to do. pivot_table as mentioned at this question seems like it could be relevant but I'm impeded by lack of experience and the brevity of the pandas documentation. Thanks for any suggestions.

Relevant: https://stackoverflow.com/a/47152692/2336654 – piRSquared Sep 20 '18 at 14:11 — piRSquared, Sep 20 '18 at 14:11
That's a really useful resource, thank you. – Scott Martin Sep 20 '18 at 14:58 — Scott Martin, Sep 20 '18 at 14:58

Scott Boston · Answer 1 · 2018-09-20T14:24:33.093

7

You can also use, get_dummies, set_index, and sum with level parameter:

(pd.get_dummies(df.set_index('term'), columns=['score'], prefix_sep=' ')
   .sum(level=0)
   .reset_index())

Output:

        term  score 0  score 1  score 2  score 3
0       this        2        0        0        0
1       that        0        1        1        0
2  the other        0        0        1        1
3  something        0        1        1        0
4   anything        0        1        0        0

edited Sep 20 '18 at 14:24

answered Sep 20 '18 at 14:14

Scott Boston

147,308
15
139
187

1

Thank you, `get_dummies` is something I've been meaning to learn more about using and this will help with that. – Scott Martin Sep 20 '18 at 15:01

jezrael · Accepted Answer · 2018-09-20T14:11:56.677

6

Use groupby with size and reshape by unstack, last add_prefix:

df = df.groupby(['term','score']).size().unstack(fill_value=0).add_prefix('score ')

Or use crosstab:

df = pd.crosstab(df['term'],df['score']).add_prefix('score ')

Or pivot_table:

df = (df.pivot_table(index='term',columns='score', aggfunc='size', fill_value=0)
        .add_prefix('score '))

print (df)
score      score 0  score 1  score 2  score 3
term                                         
anything         0        1        0        0
something        0        1        1        0
that             0        1        1        0
the other        0        0        1        1
this             2        0        0        0

edited Sep 20 '18 at 14:11

answered Sep 20 '18 at 14:07

jezrael

822,522
95
1,334
1,252

3

I was about to post the crosstab, then you edited. Then I was like, oh pivot! And you edited :D. Nice answer – ALollz Sep 20 '18 at 14:11
I think `crosstab` is my favorite of the options here - succinct and simple. Thanks also for the range of methods - a lot to learn. – Scott Martin Sep 20 '18 at 14:59
@jezrael Where you've suggested `aggfunc='size'`, what is `'size'`? The examples in the docs show functions being passed in unquoted, like `aggfunc=np.sum`. – Scott Martin Sep 24 '18 at 14:02
1

@ScottMartin - This function is same like [`size`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html) – jezrael Sep 24 '18 at 14:03
1

I see - part of the [GroupBy computations / descriptive stats function group](https://pandas.pydata.org/pandas-docs/stable/api.html#id39). Thanks. – Scott Martin Sep 24 '18 at 14:13

pandas - Counting occurrences of a value in a DataFrame per each unique value in another column

2 Answers2

Linked