In a dataframe of Pandas, some columns that are numeric, and some rows have one of these numeric columns be the value of NaN.
I know how to select these numeric columns as:
df.select_dtypes(include=np.number)
but how to exclude these rows in which one of the numeric columns is NaN?
I'm sorry that my former description might be not clear, so I add more details to clarify it. Hope it could be more clear.
Let's say there is the dataframe as the following: There are four columns: A, B, C, and D. The datatype of A and C is Object, and the datatype of B and D is Float.
A(Object) B(Float)C(Object) D(Float)
Apple NaN String1 1.0
Orange 2.0 NaN 3.0
Banana 4.0 String2 5.0
NaN 1.0 String3 2.0
Pear NaN String4 3.0
Melon 2.0 String5 NaN
And we'll only remove those rows in which some numeric columns(float) are NaN, and those rows in which some non-numeric columns(Object) are NaN should NOT be removed.
The final result will be as the following:
A(Object) B(Float)C(Object) D(Float)
Orange 2.0 NaN 3.0
Banana 4.0 String2 5.0
NaN 1.0 String3 2.0
I'm considering to use lambda and pipeline. Anyone who can give a hint will be really appreciated!
Thanks a lot!