I want to check whether specific columns in dataframe contains nan or not. Then remove the row whose specific columns contain nan.
Here is my wrong code:
import numpy as np
import pandas as pd
from numpy import nan
df = pd.DataFrame(np.array([[nan, 2, 3], [nan, nan, 6], [nan, 8, 9]]),
columns=['a', 'b', 'c'])
for i in range(len(df.index)):
print(type(df["b"].loc[i]))
if df["b"].loc[i] is np.float64(nan):
df = df.drop([i])
print(df)
But df["b"].loc[i] is np.float64(nan) is False and the result is
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
a b c
0 NaN 2.0 3.0
1 NaN NaN 6.0
2 NaN 8.0 9.0
I can use another code to make it, but I want to know why the above code cannot do it.
Right code is
df1 = pd.DataFrame(np.array([[nan, 2, 3], [nan, nan, 6], [nan, 8, 9]]),
columns=['a', 'b', 'c'])
for i in range(len(df1.index)):
if df1.isna()["b"].loc[i]:
df1 = df1.drop([i])
print(df1)