3

Suppose I have the following dataframe in df:

a     | b     | c
------+-------+-------
5     | 2     | 4
NaN   | 6     | 8
5     | 9     | 0
3     | 7     | 1

If I do df.loc[df['a'] == 5] it will correctly return the first and third row, but then if I do a df.loc[df['a'] == np.NaN] it returns nothing.

I think this is more a python thing than a pandas one. If I compare np.nan against anything, even np.nan == np.nan will evaluate as False, so the question is, how should I test for np.nan?

luisfer
  • 1,927
  • 7
  • 40
  • 54
  • 3
    The target is a little more complicated but basically you do the null checking with `df['a'].isnull()` or `pd.isnull(df['a'])`. And the selection is easy after that: `df[df['a'].isnull()]` – ayhan Sep 14 '16 at 16:05
  • 1
    you can use numpy.isnan() which gives you a boolean array of the same shape as the input array – dnalow Sep 14 '16 at 16:08
  • 2
    In general, I'd avoid using `np.isnan` on DataFrames. It's not as robust as `pd.isnull`, which has the same functionality. For example, compare what happens when you try `np.isnan(df['a'])` with `pd.isnull(df['a'])` when `df = pd.DataFrame({'a': ['x', np.nan, 'y']})`. – root Sep 14 '16 at 16:41
  • Thanks guys, I used both `ìsnull()` and `isnan()` and got the same results that did what I wanted. Why you didn't post your answers as answers? – luisfer Sep 14 '16 at 17:38
  • 1
    This tutorial could be helpful: https://chartio.com/resources/tutorials/how-to-check-if-any-value-is-nan-in-a-pandas-dataframe – estebanpdl Sep 14 '16 at 21:39

1 Answers1

5

Try using isnull like so:

    import pandas as pd
    import numpy as np

    a=[1,2,3,np.nan,5,6,7]
    df = pd.DataFrame(a)

    df[df[0].isnull()]
Tommy
  • 622
  • 5
  • 8