11

In some data I am processing I am encountering data of the type float, which are filled with 'nan', i.e. float('nan').

However checking for it does not work as expected:

float('nan') == float('nan')
>> False

You can check it with math.isnan, but as my data also contains strings (For example: 'nan', but also other user input), it is not that convenient:

import math
math.isnan(float('nan'))
>> True
math.isnan('nan')
>> TypeError: must be real number, not str

In the ideal world I would like to check if a value is in a list of all possible NaN values, for example:

import numpy as np
if x in ['nan', np.nan, ... ]:
    # Do something
    pass

Now the question: How can I still use this approach but also check for float('nan') values? And why equals float('nan') == float('nan') False

Jan Willem
  • 820
  • 1
  • 6
  • 24
  • `if x == 'nan' or isnan(x)`…?! – deceze Dec 08 '21 at 09:38
  • 2
    Better yet, use `isnan` and clean your data so you don't have to worry about 'or is it actually a string?' – hobbs Dec 08 '21 at 09:44
  • The `if x == 'nan' or isnan(x)` will raise a TypeError if x is not a real number. Cleaning up is one solution but I would prefer to do everything in one go if possible. Also I am still curious why `float('nan) == float('nan')` equals `False` – Jan Willem Dec 08 '21 at 09:50
  • 2
    "but as my data also contains strings" - it sounds like you should fix that issue first, then use `isnan`. – user2357112 Dec 08 '21 at 09:51
  • 1
    "Also I am still curious why float('nan) == float('nan') equals False" - because it was designed so `x != x` would be a convenient way to test for NaN in settings without an `isnan` function (common in the early days of IEEE 754), and because having the comparison produce `True` isn't actually more useful. – user2357112 Dec 08 '21 at 09:54
  • If you're doing a comparison `x == y` and `x` and `y` are both NaNs produced by invalid floating-point operations, `True` isn't actually a more useful outcome than `False`. On the other hand, if you're looking for a nan check, `x != x` does that, so `x == nan` doesn't need to be a nan check too. – user2357112 Dec 08 '21 at 09:56
  • @user2357112supportsMonica I had not realised your last point - `x != x` is cool. Side effects? – jtlz2 Dec 08 '21 at 10:33

3 Answers3

11

Why not just wrap whatever you pass to math.isnan with another float conversion? If it was already a float (i.e. float('nan')) you just made a "redundant" call:

import math

def is_nan(value):
    return math.isnan(float(value))

And this seems to give your expected results:

>>> is_nan(float('nan'))
True
>>> is_nan('nan')
True
>>> is_nan(np.nan)
True
>>> is_nan(5)
False
>>> is_nan('5')
False

This will still raise a ValueError for non-numeric (except 'nan') strings. If that's a problem, you can wrap with try/except. As long as the float conversion worked, there is no reason for isnan to fail. So we are basically catching non-numeric strings that my fail the float conversion:

def is_nan(value):
    try:
        return math.isnan(float(value))
    except ValueError:
        return False

Any non-numeric string is surely not a NaN value so return False.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
4

It's very pedestrian, and a bit ugly, but why not just do the following?

import math
import numpy as np

if math.isnan(x) or x in ['nan', np.nan, ... ]:
    # Do something
    pass

I want to recommend a Yoda expression but haven't quite worked it out yet.

If you want to sweep everything under the carpet put it in a lambda or function.

Following on from https://stackoverflow.com/a/64090763/1021819, you can try to get the iterator to evaluate any in a lazy fashion. The problem then is that if none of the first conditions evaluates to True then the math.isnan() call is executed and can still throw the TypeError. If you evaluate lazily you can guard the math.isnan() call with a type check against str:

fn_list_to_check=[
    lambda x: x in ['nan', np.nan, ... ],
    lambda x: not isinstance(x, str),
    lambda x: math.isnan(x)
    ]

if any(f(x) for f in fn_list_to_check):
    # Do something
    pass

Note the absence of square list brackets in the any i.e. any() not any([]) (who knew?).

I think it's quite brilliant but equally as ugly - choose your poison.

For the second part of the question (why float('nan') != float('nan')), see

What is the rationale for all comparisons returning false for IEEE754 NaN values?

jtlz2
  • 7,700
  • 9
  • 64
  • 114
-2

You can check for NaN value like this,

 def isNaN(num):
    if num == 'nan':
        return True
    return num!= num