This is an extension to this question, where OP wanted to know how to drop rows where the values in a single column are NaN.
I'm wondering how I can drop rows where the values in 2 (or more) columns are both NaN. Using the second answer's created Data Frame:
In [1]: df = pd.DataFrame(np.random.randn(10,3))
In [2]: df.ix[::2,0] = np.nan; df.ix[::4,1] = np.nan; df.ix[::3,2] = np.nan;
In [3]: df
Out[3]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
If I use the drop.na()
command, specifically the drop.na(subset=[1,2])
, then it completes an "or" type drop and leaves:
In[4]: df.dropna(subset=[1,2])
Out[4]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
What I want is an "and" type drop, where it drops rows where there is an NaN
in column index 1 and 2. This would leave:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
where only the first row is dropped.
Any ideas?
EDIT: changed data frame values for consistency