0

Having a dataframe, 'df':

l = [['a',1,3,3,1,1,3,3,3],['b',1,1,3,1,3,3,1,3],['c',1,1,1,1,3,1,1,1]]
col = ['id','x1','x2','x3','x4','y1','y2','y3','y4']

df = pd.DataFrame (l, columns = col)

df

I want to count the number of rows (ids) with value "1" in each subset of subsets of X= {x1,x2,x3,x4} and Y = {y1,y2,y3,y4} columns. For an example subset s1={ [x1,x3] , [y2,y3,y4] }, the code does:

df[(df['x1']==1) & (df['x3']==1) & (df['y2'] == 1) & (df['y3'] == 1) & (df['y4'] == 1)].count()['id']

and return "1" as count. And repeat this for all subsets of {subsets of X columns} x {subsets of Y columns}.

I need to first construct all subsets of subsets (using for example the function suggested here), and then perform the counts for each subset. What is the best way to perform this?

geek2000
  • 451
  • 5
  • 18

0 Answers0