-1

I need to delete rows from a dataframe if specific columns contains null values:

-> In this example, if col2 and col3 are null:

import pandas as pd

obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548],  'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }

df = pd.DataFrame(data=obj)



df.head()
    col1 col2   col3
0   1    741    Nan
1   2    332    2
2   7    7      74
3   47   Nan    Nan
4   12   127    127
5   67   Nan    Nan
6   58   548    548

After delete, the result should be:

df.head()
        col1 col2   col3
    0   1    741    Nan
    1   2    332    2
    2   7    7      74
    4   12   127    127
    6   58   548    548

Thanks for all!

Gizelly
  • 417
  • 2
  • 10
  • 24
  • You need "any two rows" to be Null for you to delete the row, or values on specific columns ? – rafaelc Oct 30 '19 at 16:50
  • If two specific columns are null I want to delete all rows that have this condition.In this example if col2 and col3 are null I want to delete all the line that has this condition. – Gizelly Oct 30 '19 at 17:32

3 Answers3

1

Use Boolean indexing with DataFrame.isna or DataFrame.isnull to check NaN or Null values. Select the maximum number of NaN allowed per rows with DataFrame.sum and Series.le:

df=df.replace('Nan',np.nan)
new_df=df[df.isnull().sum(axis=1).le(1)]
print(new_df)

   col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0

To specifict columns:

DataFrame.all

df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]
print(df_filtered)

   col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0
Community
  • 1
  • 1
ansev
  • 30,322
  • 5
  • 17
  • 31
  • 1
    If two specific columns are null I want to delete all rows that have this condition.In this example if col2 and col3 are null I want to delete all the line that has this condition. Sorry, I already edited a question. – Gizelly Oct 30 '19 at 17:36
  • I did the following and it didn't work out:: df2= df.drop(df[(df.col2.isnull()) & (df.col3.isnull())].index) – Gizelly Oct 30 '19 at 17:41
1

Using dropna

axis = 0 to delete rows, thresh=1 has the number of non-null values required to drop the row.

You can use subset=['col2', 'col3'] if you want to define the columns on which the as the basis of dropping rows.

You can try this:

df = df.dropna(axis=0, subset=['col2', 'col3'], how="any", thresh=1)
pissall
  • 7,109
  • 2
  • 25
  • 45
  • I did the following and it didn't work out:: df2= df.drop(df[(df.col2.isnull()) & (df.col3.isnull())].index) – Gizelly Oct 30 '19 at 17:43
  • If two specific columns are null I want to delete all rows that have this condition.In this example if col2 and col3 are null I want to delete all the line that has this condition. Sorry, I already edited a question. – Gizelly Oct 30 '19 at 17:43
  • @Gizélly Edited my answer – pissall Oct 30 '19 at 17:49
0

After deploying the solution proposed by @ansev, everything worked:

import pandas as pd

obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548],  'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }

df = pd.DataFrame(data=obj)

df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]

print(df_filtered)

col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0
Gizelly
  • 417
  • 2
  • 10
  • 24