This answer introduces the thresh
parameter which is absolutely useful in some use-cases.
Note: I added this answer because some questions have been marked as duplicates directing to this page which none of the approaches here addresses such use-cases eg;
The bellow df format.
Example:
This approach addresses:
- Dropping rows/columns with all
NaN
- Keeping rows/columns with desired number of
non-NaN
values (having valid data)
# Approaching rows
------------------
# Sample df
df = pd.DataFrame({'Names': ['Name1', 'Name2', 'Name3', 'Name4'],
'Sunday': [2, None, 3, 3],
'Tuesday': [0, None, 3, None],
'Wednesday': [None, None, 4, None],
'Friday': [1, None, 7, None]})
print(df)
Names Sunday Tuesday Wednesday Friday
0 Name1 2.0 0.0 NaN 1.0
1 Name2 NaN NaN NaN NaN
2 Name3 3.0 3.0 4.0 7.0
3 Name4 3.0 NaN NaN NaN
# Keep only the rows with at least 2 non-NA values.
df = df.dropna(thresh=2)
print(df)
Names Sunday Tuesday Wednesday Friday
0 Name1 2.0 0.0 NaN 1.0
2 Name3 3.0 3.0 4.0 7.0
3 Name4 3.0 NaN NaN NaN
# Keep only the rows with at least 3 non-NA values.
df = df.dropna(thresh=3)
print(df)
Names Sunday Tuesday Wednesday Friday
0 Name1 2.0 0.0 NaN 1.0
2 Name3 3.0 3.0 4.0 7.0
# Approaching columns: We need axis here to direct drop to columns
------------------------------------------------------------------
# If axis=0 or not called, drop is applied to only rows like the above examples
# original df
print(df)
Names Sunday Tuesday Wednesday Friday
0 Name1 2.0 0.0 NaN 1.0
1 Name2 NaN NaN NaN NaN
2 Name3 3.0 3.0 4.0 7.0
3 Name4 3.0 NaN NaN NaN
# Keep only the columns with at least 2 non-NA values.
df =df.dropna(axis=1, thresh=2)
print(df)
Names Sunday Tuesday Friday
0 Name1 2.0 0.0 1.0
1 Name2 NaN NaN NaN
2 Name3 3.0 3.0 7.0
3 Name4 3.0 NaN NaN
# Keep only the columns with at least 3 non-NA values.
df =df.dropna(axis=1, thresh=3)
print(df)
Names Sunday
0 Name1 2.0
1 Name2 NaN
2 Name3 3.0
3 Name4 3.0
Conclusion:
- The
thresh
parameter from pd.dropna() doc
gives you the flexibility to decide the range of non-Na
values you want to keep in a row/column.
- The
thresh
parameter addresses a dataframe of the above given structure which df.dropna(how='all')
does not.