Remove all rows which has False in many columns as in the below dataset

Question

I need to drop all the rows which has all FALSE from col2 to col6 using python. I think "df = df[df.any(axis=1)]" will also take columns "id" and "col1" into consideration. I need to exclude columns "id" and "col1".

Appreciate your help.

I tried to iterate through the loop for the data extracted from csv as below

import pandas as pd
columns_to_keep = ['col1'] 

for col in df.columns[1:]:
    if df[col].any():  
        columns_to_keep.append(col)  

filtered_df = df[columns_to_keep]

But I am getting error list indices must be integers or slices, not str

Some problem with my solution? Your data instead `False` contains `FALSE` strings? — jezrael, Jun 27 '23 at 05:09

jezrael · Answer 1 · 2023-06-21T08:33:58.503

Use DataFrame.loc with : for select all rows and conditions - chain test at least one True and add incluse columns:

include = ['id','col1']
out = df.loc[:, df.any() | df.columns.isin(include)]

print (out)
   id      col1   col5   col6
0   0  0.036492  False  False
1   1  0.017991  False   True
2   2  0.150298   True  False
3   3  0.065861  False  False

More general solution is test only boolean columns with DataFrame.select_dtypes and append non boolens by Series.reindex:

out = df.loc[:, df.select_dtypes('boolean').any().reindex(df.columns, fill_value=True)]

print (out)
   id      col1   col5   col6
0   0  0.036492  False  False
1   1  0.017991  False   True
2   2  0.150298   True  False
3   3  0.065861  False  False

If want remove rows:

exclude = ['id','col1']
out = df[df.drop(exclude, axis=1).any(axis=1)]

print (out)
   id      col1   col2   col3   col4   col5   col6   col7
1   1  0.017991  False  False  False  False   True  False
2   2  0.150298  False  False  False   True  False  False

out = df[df.select_dtypes('boolean').any(axis=1)]

print (out)
   id      col1   col2   col3   col4   col5   col6   col7
1   1  0.017991  False  False  False  False   True  False
2   2  0.150298  False  False  False   True  False  False

score 1 · Accepted Answer · answered Jun 21 '23 at 08:32

Using this will give you the output:

out_df = df[~((df.iloc[:, 2:7] == "FALSE").all(axis=1))]

Output:

   id      col1   col2   col3   col4   col5   col6   col7
1   1   0.017991  FALSE  FALSE  FALSE  FALSE   TRUE  FALSE
2   2   0.150298  FALSE  FALSE  FALSE   TRUE  FALSE  FALSE

score -1 · Answer 3 · answered Jun 21 '23 at 11:34

To drop rows that have all False values from columns col2 to col6 in a DataFrame while excluding the id and col1 columns, you can use the loc accessor to specify the column range and use the any method with the axis parameter set to 1. Here's an example:

df = pd.DataFrame(data)

# Drop rows with all False values from col2 to col6 (excluding id and col1)
df = df.loc[df.iloc[:, 2:7].any(axis=1)]

print(df)

Remove all rows which has False in many columns as in the below dataset

3 Answers3