0

I want to check whether specific columns in dataframe contains nan or not. Then remove the row whose specific columns contain nan.

Here is my wrong code:

import numpy as np
import pandas as pd
from numpy import nan

df = pd.DataFrame(np.array([[nan, 2, 3], [nan, nan, 6], [nan, 8, 9]]),
                   columns=['a', 'b', 'c'])

for i in range(len(df.index)):
    print(type(df["b"].loc[i]))
    if df["b"].loc[i] is np.float64(nan):
        df = df.drop([i])
print(df)

But df["b"].loc[i] is np.float64(nan) is False and the result is

<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
    a    b    c
0 NaN  2.0  3.0
1 NaN  NaN  6.0
2 NaN  8.0  9.0

I can use another code to make it, but I want to know why the above code cannot do it.

Right code is

df1 = pd.DataFrame(np.array([[nan, 2, 3], [nan, nan, 6], [nan, 8, 9]]),
                   columns=['a', 'b', 'c'])

for i in range(len(df1.index)):
    if df1.isna()["b"].loc[i]:
        df1 = df1.drop([i])
print(df1)

1 Answers1

0

The reason is that the is operator is not a suitable way to test equality in the context of NaN values.

Here is a post which discusses the topic in more detail.

Yas
  • 98
  • 5