4

Supposing that I have a DataFrame along the lines of:

    term      score
0   this          0
1   that          1
2   the other     3
3   something     2
4   anything      1
5   the other     2
6   that          2
7   this          0
8   something     1

How would I go about counting up the instances in the score column by unique values in the term column? Producing a result like:

    term      score 0     score 1     score 2     score 3
0   this            2           0           0           0
1   that            0           1           1           0
2   the other       0           0           1           1
3   something       0           1           1           0
4   anything        0           1           0           0

Related questions I've read here include Python Pandas counting and summing specific conditions and COUNTIF in pandas python over multiple columns with multiple conditions, but neither seems to quite be what I'm looking to do. pivot_table as mentioned at this question seems like it could be relevant but I'm impeded by lack of experience and the brevity of the pandas documentation. Thanks for any suggestions.

Scott Martin
  • 1,260
  • 2
  • 17
  • 27

2 Answers2

7

You can also use, get_dummies, set_index, and sum with level parameter:

(pd.get_dummies(df.set_index('term'), columns=['score'], prefix_sep=' ')
   .sum(level=0)
   .reset_index())

Output:

        term  score 0  score 1  score 2  score 3
0       this        2        0        0        0
1       that        0        1        1        0
2  the other        0        0        1        1
3  something        0        1        1        0
4   anything        0        1        0        0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
6

Use groupby with size and reshape by unstack, last add_prefix:

df = df.groupby(['term','score']).size().unstack(fill_value=0).add_prefix('score ')

Or use crosstab:

df = pd.crosstab(df['term'],df['score']).add_prefix('score ')

Or pivot_table:

df = (df.pivot_table(index='term',columns='score', aggfunc='size', fill_value=0)
        .add_prefix('score '))

print (df)
score      score 0  score 1  score 2  score 3
term                                         
anything         0        1        0        0
something        0        1        1        0
that             0        1        1        0
the other        0        0        1        1
this             2        0        0        0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 3
    I was about to post the crosstab, then you edited. Then I was like, oh pivot! And you edited :D. Nice answer – ALollz Sep 20 '18 at 14:11
  • I think `crosstab` is my favorite of the options here - succinct and simple. Thanks also for the range of methods - a lot to learn. – Scott Martin Sep 20 '18 at 14:59
  • @jezrael Where you've suggested `aggfunc='size'`, what is `'size'`? The examples in the docs show functions being passed in unquoted, like `aggfunc=np.sum`. – Scott Martin Sep 24 '18 at 14:02
  • 1
    @ScottMartin - This function is same like [`size`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html) – jezrael Sep 24 '18 at 14:03
  • 1
    I see - part of the [GroupBy computations / descriptive stats function group](https://pandas.pydata.org/pandas-docs/stable/api.html#id39). Thanks. – Scott Martin Sep 24 '18 at 14:13