4

I have the following pandas Dataframe with a NaN in it.

import pandas as pd
df = pd.DataFrame([1,2,3,float('nan')], columns=['A'])
df

    A
0   1
1   2
2   3
3 NaN

I also have the list filter_list using which I want to filter my Dataframe. But if i use .isin() function, it is not detecting the NaN. Instead of getting True I am getting False in the last row

filter_list = [1, float('nan')]

df['A'].isin(filter_list)
0     True
1    False
2    False
3    False
Name: A, dtype: bool

Expected output:

0     True
1    False
2    False
3    True
Name: A, dtype: bool

I know that I can use .isnull() to check for NaNs. But here I have other values to check as well. I am using pandas 0.16.0 version

Edit: The list filter_list comes from the user. So it might or might not have NaN. Thats why i am using .isin()

Kathirmani Sukumar
  • 10,445
  • 5
  • 33
  • 34
  • 1
    This won't work because `np` uses the fact that `NaN != NaN` which is why this fails, so you'd have to filter the `NaN` values first and then filter the other values – EdChum Aug 05 '15 at 13:17
  • is there a way where i can create the `NaN` element in the `filter_list`, so that pandas understands it? – Kathirmani Sukumar Aug 05 '15 at 13:19
  • 1
    No I don't think so, for instance `df['A'] == float('nan')` still won't work, bottom line is you have to use `isnull` or `notnull` to test for `NaN` correctly – EdChum Aug 05 '15 at 13:20
  • 1
    Sanitize your user inputs! Don't let them input NaN! Fillna with some appropreate NA value and do the same to the user inputs. – firelynx Aug 05 '15 at 13:42

4 Answers4

7

The float NaN has the interesting property that it is not equal to itself:

In [194]: float('nan') == float('nan')
Out[194]: False

isin checks for equality. So you can't use isin to check if a value equals NaN. To check for NaNs it is best to use np.isnull.


In [200]: df['A'].isin([1]) | df['A'].isnull()
Out[200]: 
0     True
1    False
2    False
3     True
Name: A, dtype: bool
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • The problem is the list `filter_list` comes from the user. So it might or might not have `NaN` – Kathirmani Sukumar Aug 05 '15 at 13:24
  • 2
    Either change the user interface so that `filter_nan` is an additional parameter and NaN is not included in `filter_list`, or else check `pd.isnull(filter_list).any()` and handle the cases accordingly. – unutbu Aug 05 '15 at 13:27
6

You could replace nan with a unique non-NaN value that will not occur in your list, say 'NA' or ''. For example:

In [23]: import pandas as pd

In [24]: df = pd.DataFrame([1, 2, 3, pd.np.nan], columns=['A'])

In [25]: filter_list = pd.Series([1, pd.np.nan])

In [26]: na_equiv = 'NA'

In [27]: df['A'].replace(pd.np.nan, na_equiv).isin(filter_list.replace(pd.np.nan, na_equiv))
Out[27]:
0     True
1    False
2    False
3     True
Name: A, dtype: bool
S Anand
  • 11,364
  • 2
  • 28
  • 23
2

I think that the simplest way is to use numpy.nan:

import pandas as pd
import numpy as np

df = pd.DataFrame([1, 2, 3, np.nan], columns=['A'])
filter_list = [1, np.nan]
df['A'].isin(filter_list)
shahar
  • 355
  • 2
  • 18
1

If you really what to use isin() to match NaN. You can create a class that has the same hash as nan and return True when compare to nan:

import numpy as np
import pandas as pd

class NAN(object):
    def __eq__(self, v):
        return np.isnan(v)

    def __hash__(self):
        return hash(np.nan)

nan = NAN()

df = pd.DataFrame([1,2,3,float('nan')], columns=['A'])
df.A.isin([1, nan])
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • 1
    Much easier option is to write the following: import numpy as np df = pd.DataFrame([1,2,3,np.nan], columns=['A']) – shahar Aug 20 '20 at 09:22