Perform "count" in all subsets of a set of columns

Asked Feb 20 '18 at 03:43

Active Feb 20 '18 at 05:36

Viewed 94 times

Having a dataframe, 'df':

l = [['a',1,3,3,1,1,3,3,3],['b',1,1,3,1,3,3,1,3],['c',1,1,1,1,3,1,1,1]]
col = ['id','x1','x2','x3','x4','y1','y2','y3','y4']

df = pd.DataFrame (l, columns = col)

I want to count the number of rows (ids) with value "1" in each subset of subsets of X= {x1,x2,x3,x4} and Y = {y1,y2,y3,y4} columns. For an example subset s1={ [x1,x3] , [y2,y3,y4] }, the code does:

df[(df['x1']==1) & (df['x3']==1) & (df['y2'] == 1) & (df['y3'] == 1) & (df['y4'] == 1)].count()['id']

and return "1" as count. And repeat this for all subsets of {subsets of X columns} x {subsets of Y columns}.

I need to first construct all subsets of subsets (using for example the function suggested here), and then perform the counts for each subset. What is the best way to perform this?

edited Feb 20 '18 at 05:36

asked Feb 20 '18 at 03:43

geek2000

Do you mean literally all subsets, of all sizes? – DYZ Feb 20 '18 at 03:46
Yes,all subsets of all sizes! :-) – geek2000 Feb 20 '18 at 05:32

Perform "count" in all subsets of a set of columns

0 Answers0