I have the following data frame.
A1 A2 A3 B1 B2 B3 C1 C2 C3
0 0 0 1 1 1 1 0 1 1
1 0 0 0 0 0 0 0 0 0
2 1 1 1 0 1 1 1 1 1
I am looking to filter it based on groups of column and occurrence of non-zero. I wrote the following to achieve it.
import pandas as pd
df = pd.read_csv("TEST_TABLE.txt", sep='\t')
print(df)
group1 = ['A1','A2','A3']
group2 = ['B1','B2','B3']
group3 = ['C1','C2','C3']
df2 = df[(df[group1] !=0).any(axis=1) & (df[group2] !=0).any(axis=1) & (df[group3] !=0).any(axis=1)]
print(df2)
The output was perfect:
A1 A2 A3 B1 B2 B3 C1 C2 C3
0 0 0 1 1 1 1 0 1 1
2 1 1 1 0 1 1 1 1 1
Now, how to modify the code such that, I can impose a threshold value for "any". i.e retain rows for each group with atleast 2 non-zeros. Hence, the final output will give
A1 A2 A3 B1 B2 B3 C1 C2 C3
2 1 1 1 0 1 1 1 1 1
Thanks in advance.