Python - Drop row if two columns are NaN

Question

This is an extension to this question, where OP wanted to know how to drop rows where the values in a single column are NaN.

I'm wondering how I can drop rows where the values in 2 (or more) columns are both NaN. Using the second answer's created Data Frame:

In [1]: df = pd.DataFrame(np.random.randn(10,3))

In [2]: df.ix[::2,0] = np.nan; df.ix[::4,1] = np.nan; df.ix[::3,2] = np.nan;

In [3]: df
Out[3]:
          0         1         2
0       NaN       NaN       NaN
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

If I use the drop.na() command, specifically the drop.na(subset=[1,2]), then it completes an "or" type drop and leaves:

In[4]: df.dropna(subset=[1,2])
Out[4]: 
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
5 -1.250970  0.030561 -2.678622
7  0.049896 -0.308003  0.823295

What I want is an "and" type drop, where it drops rows where there is an NaN in column index 1 and 2. This would leave:

          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

where only the first row is dropped.

Any ideas?

EDIT: changed data frame values for consistency

Alicia Garcia-Raboso · Accepted Answer · 2016-08-24T17:51:22.500

75

Any one of the following two:

df.dropna(subset=[1, 2], how='all')

or

df.dropna(subset=[1, 2], thresh=1)

edited Aug 24 '16 at 17:51

answered Aug 24 '16 at 16:46

Alicia Garcia-Raboso

13,193
1
43
48

3

Now I feel like an idiot, should've picked up on that...thanks! – Kevin M Aug 24 '16 at 16:51
How does it work if I want to delete only NAN rows, if 2 (or more) consecutive rows have NAN values? However, a single row with a NAN (the row above and underneath is without NANs) has to stay. – zeniapy Jun 15 '20 at 13:48
@zeniapy I think you have to iterate over the dataframe with `df.iterrows()` and for each row that has some nan values, check the 2 entries before and after, and drop the current row if these entries has nan values too. – KLaz Oct 14 '20 at 12:48
I'm using pandas 1.1.5 and I got: `supplying multiple axes to axis is no longer supported.` – Bruno Ambrozio Sep 29 '21 at 15:17
Never mind. If I add `subset=` then it works. Thanks. – Bruno Ambrozio Sep 29 '21 at 15:19

ragesz · Answer 2 · 2016-08-24T20:07:07.763

9

Specify dropna() method:

df.dropna(subset=[1,2], how='all')

edited Aug 24 '16 at 20:07

answered Aug 24 '16 at 16:50

ragesz

9,009
20
71
88

KLaz · Answer 3 · 2020-10-14T15:08:27.477

The accepted answer didn't work for me. I tried the following code and nothing changed in my dataframe.

df.dropna(subset=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'], how='all', inplace=True)

EDIT: I think it didn't work because 'all' refers to all columns in the df, and not all columns in the subset. The thresh is also a bit confusing for me, because it's a condition to keep rows, even though this a dropping function. In the OP, if I understand correctly, it's required to have at least one non nan value to keep a row, and if a row has more than 2 nan values, it should be dropped. In my case, I want to drop a row if all values in the columns of the subset have nan values, so I want to keep a row if all 6 values are not nan, and I don't know if this corresponds to thresh=6.

These two commands also didn't work for me:

df.dropna(subset=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'], thres=6, inplace=True)
df.dropna(subset=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'], how='all', thres=6, inplace=True)

and when I added axis=1 or axis='columns' in the parameters, I got a KeyError that the list of columns I am passing are not in the dataframe, and I don't know the reason why.

I ended up iterating over the df like this, also because I realized that some supposedly NaN values, might not be caught, because they are Null or empty:

indices_to_drop = list()
for index, row in df.iterrows():
        if (pd.isnull(row["col1"]) or pd.isna(row["col1"]) or row["col1"] == "") and \
                (pd.isnull(row["col2"]) or pd.isna(row["col2"]) or row["col2"] == "") and \
                (pd.isnull(row["col3"]) or pd.isna(row["col3"]) or row["col3"]
                 == "") and \
                (pd.isnull(row["col4"]) or pd.isna(row["col4"]) or row["col4"] ==
                 "") \
                and \
                (pd.isnull(row["col5"]) or pd.isna(row["col5"]) or row[
                    "col6"] == "") and \
                (pd.isnull(row["col6"]) or pd.isna(row["col6"]) or row["col6"] == ""):
            indices_to_drop.append(index)
df.drop(labels=indices_to_drop, inplace=True)

score 2 · Answer 4 · edited Apr 06 '21 at 05:22

2

I faced a similar issue where I'd 45 features(columns) and wanted to drop rows for only selected features having NaN values eg columns 7 to 45.

Step 1: I created a list (col_lst) from columns which I wanted to be operated for NaN

Step 2: df.dropna(axis = 0, subset = col_lst, how = 'all', inplace = True)

The above step removed only those rows fromthe dataframe which had all (not any) the columns from 7 to 45 with NaN values.

edited Apr 06 '21 at 05:22

ah bon

9,293
12
65
148

answered Nov 28 '20 at 06:15

Devendra Bharadwaj

21
1

How 'bout sharing some code for demonstration instead of dry words? – KaiserKatze Nov 28 '20 at 08:25

score 1 · Answer 5 · edited Dec 21 '21 at 12:59

1

Try this code below, I hope it can solve your problem

drop_rows = df.dropna( how='any',
                          subset=['Columns1', 'Columns2'])

edited Dec 21 '21 at 12:59

buddemat

4,552
14
29
49

answered Dec 21 '21 at 12:58

Rugaya Bsa

27
1

No, you have to set `how='all'` since OP asked to remove a row if **both** columns are *NaN*. Your solution will also remove rows where only one of the two columns contains *NaNs*. – rachwa Jun 16 '22 at 18:34

Python - Drop row if two columns are NaN

5 Answers5

Linked

Related