How to delete rows from a dataframe if specific columns contains null values?

Question

I need to delete rows from a dataframe if specific columns contains null values:

-> In this example, if col2 and col3 are null:

import pandas as pd

obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548],  'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }

df = pd.DataFrame(data=obj)



df.head()
    col1 col2   col3
0   1    741    Nan
1   2    332    2
2   7    7      74
3   47   Nan    Nan
4   12   127    127
5   67   Nan    Nan
6   58   548    548

After delete, the result should be:

df.head()
        col1 col2   col3
    0   1    741    Nan
    1   2    332    2
    2   7    7      74
    4   12   127    127
    6   58   548    548

Thanks for all!

You need "any two rows" to be Null for you to delete the row, or values on specific columns ? — rafaelc, Oct 30 '19 at 16:50
If two specific columns are null I want to delete all rows that have this condition.In this example if col2 and col3 are null I want to delete all the line that has this condition. — Gizelly, Oct 30 '19 at 17:32

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

Use Boolean indexing with DataFrame.isna or DataFrame.isnull to check NaN or Null values. Select the maximum number of NaN allowed per rows with DataFrame.sum and Series.le:

df=df.replace('Nan',np.nan)
new_df=df[df.isnull().sum(axis=1).le(1)]
print(new_df)

   col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0

To specifict columns:

DataFrame.all

df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]
print(df_filtered)

   col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 30 '19 at 16:44

ansev

30,322
5
17
31

1

If two specific columns are null I want to delete all rows that have this condition.In this example if col2 and col3 are null I want to delete all the line that has this condition. Sorry, I already edited a question. – Gizelly Oct 30 '19 at 17:36
I did the following and it didn't work out:: df2= df.drop(df[(df.col2.isnull()) & (df.col3.isnull())].index) – Gizelly Oct 30 '19 at 17:41

pissall · Answer 2 · 2019-10-30T17:49:43.700

1

Using dropna

axis = 0 to delete rows, thresh=1 has the number of non-null values required to drop the row.

You can use subset=['col2', 'col3'] if you want to define the columns on which the as the basis of dropping rows.

You can try this:

df = df.dropna(axis=0, subset=['col2', 'col3'], how="any", thresh=1)

edited Oct 30 '19 at 17:49

answered Oct 30 '19 at 16:45

pissall

7,109
2
25
45

I did the following and it didn't work out:: df2= df.drop(df[(df.col2.isnull()) & (df.col3.isnull())].index) – Gizelly Oct 30 '19 at 17:43
If two specific columns are null I want to delete all rows that have this condition.In this example if col2 and col3 are null I want to delete all the line that has this condition. Sorry, I already edited a question. – Gizelly Oct 30 '19 at 17:43
@Gizélly Edited my answer – pissall Oct 30 '19 at 17:49

score 0 · Answer 3 · answered Oct 30 '19 at 18:24

After deploying the solution proposed by @ansev, everything worked:

import pandas as pd

obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548],  'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }

df = pd.DataFrame(data=obj)

df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]

print(df_filtered)

col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0

How to delete rows from a dataframe if specific columns contains null values?

3 Answers3

DataFrame.all

Linked