Append count of rows meeting a condition within a group to Pandas dataframe

Question

I know how to append a column counting the number of elements in a group, but I need to do so just for the number within that group that meets a certain condition.

For example, if I have the following data:

import numpy as np
import pandas as pd

columns=['group1', 'value1']

data = np.array([np.arange(5)]*2).T
mydf = pd.DataFrame(data, columns=columns)

mydf.group1 = [0,0,1,1,2]
mydf.value1 = ['P','F',100,10,0]

valueslist={'50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80','81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99','100','A','B','C','D','P','S'}

and my dataframe therefore looks like this:

mydf

   group1 value1
0       0      P
1       0      F
2       1    100
3       1     10
4       2      0

I would then want to count the number of rows within each group1 value where value1 is in valuelist.

My desired output is:

   group1 value1 count
0       0      P     1
1       0      F     1
2       1    100     1
3       1     10     1
4       2      0     0

I think the output is unclear. Why is it a 1 on row3? (+1 for giving good sample data) — Anton vBR, Oct 09 '17 at 15:42
@AntonvBR Because there's only 1 `value1` value in `group1` = `1` that's in `valuelist`. Do you see what I mean? We are counting within the groups of `group1` how many rows have `value1` values that are in the `valuelist`. To give you context, these are grades for groups of classes. So the `value1` value of 100 was passing but 10 was not. Thus `count` shows that 1 class in the `group1` group "1" of elective classes has been passed. Same thing applies to the group 0 of `group1`. One had a passing value (`P`) and one failing (`F`) thus the total number passed for that group was 1. — Hack-R, Oct 09 '17 at 15:50

score 2 · Accepted Answer · answered Oct 09 '17 at 15:43

After changing the type of the value1 column to match your valueslist (or the other way around), you can use isin to get a True/False column, and convert that to 1s and 0s with astype(int). Then we can apply an ordinary groupby transform:

In [13]: mydf["value1"] = mydf["value1"].astype(str)

In [14]: mydf["count"] = (mydf["value1"].isin(valueslist).astype(int) 
                          .groupby(mydf["group1"]).transform(sum))

In [15]: mydf
Out[15]: 
   group1 value1  count
0       0      P      1
1       0      F      1
2       1    100      1
3       1     10      1
4       2      0      0

Looks great thanks much. Since there are characters in `valuelist` I take it we should use the `astype(str)` but not `astype(int)`. That works for me, thanks again. — Hack-R, Oct 09 '17 at 15:59

Ted Petrou · Answer 2 · 2017-10-09T15:50:18.050

1

You can groupby each group1 and then use transform to find the max of whether your values are in the list.

mydf['count'] = mydf.groupby('group1').transform(lambda x: x.astype(str).isin(valueslist).sum())

   group1 value1  count
0       0      P      1
1       0      F      1
2       1    100      1
3       1     10      1
4       2      0      0

edited Oct 09 '17 at 15:50

answered Oct 09 '17 at 15:43

Ted Petrou

59,042
19
131
136

score 1 · Answer 3 · answered Oct 09 '17 at 15:44

mydf.value1=mydf.value1.astype(str)
mydf['count']=mydf.group1.map(mydf.groupby('group1').apply(lambda x : sum(x.value1.isin(valueslist))))
mydf
Out[412]: 
   group1 value1  count
0       0      P      1
1       0      F      1
2       1    100      1
3       1     10      1
4       2      0      0

Data input :

valueslist=['50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80','81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99','100','A','B','C','D','P','S']

score 0 · Answer 4 · answered Oct 09 '17 at 15:59

Here is one way to do it, albeit a one-liner:

mydf.merge(mydf.groupby('group1').apply(lambda x: len(set(x['value1'].values).intersection(valueslist))).reset_index().rename(columns={0: 'count'}), how='inner', on='group1')


   group1 value1  count
0       0      P      1
1       0      F      1
2       1    100      1
3       1     10      1
4       2      0      0

Append count of rows meeting a condition within a group to Pandas dataframe

4 Answers4