15

I want to count the occurrence of a string in a grouped pandas dataframe column.

Assume I have the following Dataframe:

catA    catB    scores
A       X       6-4 RET
A       X       6-4 6-4
A       Y       6-3 RET
B       Z       6-0 RET
B       Z       6-1 RET

First, I want to group by catA and catB. And for each of these groups I want to count the occurrence of RET in the scores column.

The result should look something like this:

catA    catB    RET
A       X       1
A       Y       1
B       Z       2

The grouping by two columns is easy: grouped = df.groupby(['catA', 'catB'])

But what's next?

beta
  • 5,324
  • 15
  • 57
  • 99

1 Answers1

24

Call apply on the 'scores' column on the groupby object and use the vectorise str method contains, use this to filter the group and call count:

In [34]:    
df.groupby(['catA', 'catB'])['scores'].apply(lambda x: x[x.str.contains('RET')].count())

Out[34]:
catA  catB
A     X       1
      Y       1
B     Z       2
Name: scores, dtype: int64

To assign as a column use transform so that the aggregation returns a series with it's index aligned to the original df:

In [35]:
df['count'] = df.groupby(['catA', 'catB'])['scores'].transform(lambda x: x[x.str.contains('RET')].count())
df

Out[35]:
  catA catB   scores count
0    A    X  6-4 RET     1
1    A    X  6-4 6-4     1
2    A    Y  6-3 RET     1
3    B    Z  6-0 RET     2
4    B    Z  6-1 RET     2
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • is this then permanently stored in a new column? if not, how can it be stored as a new column? what i want to do is, that i only want to display the output, if the count is greater than a certain number. – beta Jul 27 '15 at 09:48
  • how can i search for two different strings? so str can contain `RET` or `ASDF`? then I need an RegEx right? – beta Jul 27 '15 at 09:58
  • 1
    Use `x.str.contains('RET|ASDF')` also you should post your full requirement, update your question and keep your question to 1 problem per question rather than incrementing your problem – EdChum Jul 27 '15 at 10:00
  • sorry. i did not know about this requirement when asking the question. it's fine now... – beta Jul 27 '15 at 10:05