2

In a dataframe of Pandas, some columns that are numeric, and some rows have one of these numeric columns be the value of NaN.

I know how to select these numeric columns as:

df.select_dtypes(include=np.number)

but how to exclude these rows in which one of the numeric columns is NaN?

I'm sorry that my former description might be not clear, so I add more details to clarify it. Hope it could be more clear.

Let's say there is the dataframe as the following: There are four columns: A, B, C, and D. The datatype of A and C is Object, and the datatype of B and D is Float.

A(Object)   B(Float)C(Object)   D(Float)
Apple       NaN     String1     1.0
Orange      2.0     NaN         3.0
Banana      4.0     String2     5.0
NaN         1.0     String3     2.0
Pear        NaN     String4     3.0
Melon       2.0     String5     NaN

And we'll only remove those rows in which some numeric columns(float) are NaN, and those rows in which some non-numeric columns(Object) are NaN should NOT be removed.

The final result will be as the following:

A(Object)   B(Float)C(Object)   D(Float)
Orange      2.0     NaN         3.0
Banana      4.0     String2     5.0
NaN         1.0     String3     2.0

I'm considering to use lambda and pipeline. Anyone who can give a hint will be really appreciated!

Thanks a lot!

Blue Sea
  • 321
  • 1
  • 3
  • 12

2 Answers2

0

lets try:

data

df = pd.DataFrame({'A': [1,np.nan,-2,0,0], 'B': [0, 0, 0, 3, -2], 'C' : [0, 0, -2, np.nan, 0], 'D': [0, -3, 2, 1, -2]} )  

Solution

df1=df.dropna(0)
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Thanks for the reply. However, it seems that this code will remove all rows in which there is a NaN, but it doesn't consider for the columns whose datatypes are numeric, such as float. For example, if one row has one column be the value of NaN, but it shouldn't be removed if the datatype of this column is non-numeric, such as object. – Blue Sea Oct 11 '20 at 01:23
-1

df = pd.DataFrame({'A': [1,np.nan,-2,0,0], 'B': [0, 0, 0, 3, -2], 'C' : [0, 0, -2, np.nan, 0], 'D': [0, -3, 2, 1, -2]} )

df.dropna(inplace=True)

0 is the default axis, inplace is preferred for efficiency reasons -- no copy of the dataframe is made.

hd1
  • 33,938
  • 5
  • 80
  • 91
  • Thanks for your reply. However, it seems that this code will remove all rows in which there is a NaN, but it doesn't consider for the columns whose datatypes are numeric, such as float. For example, if one row has one column be the value of NaN, but it shouldn't be removed if the datatype of this column is non-numeric, such as object. – Blue Sea Oct 11 '20 at 01:43