Start with the following data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']],
columns=['col1', 'col2', 'col3'])
df
col1 col2 col3
0 a b NaN
1 NaN c c
2 c d a
And say we want to keep rows with Nans in the columns col2
and col3
One way to do this is the following: which is based on the answers from this post
df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)]
col1 col2 col3
0 a b NaN
So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a ~
to invert the selection
df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]
col1 col2 col3
1 NaN c c
2 c d a
this is equivalent to:
df.dropna(subset=['col2', 'col3'])
Which we can test:
df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)])
True
You can of course test this on your own larger dataframes but should get the same answer.