1

I have a function which takes an array-like argument an a value argument as inputs. During the unit tests of this function (I use hypothesis), if a very large value is thrown (one that cannot be handled by np.float128), the function fails.

What is a good way to detect such values and handle them properly?

Below is the code for my function:

def find_nearest(my_array, value):
    """ Find the nearest value in an unsorted array.
    """
    # Convert to numpy array and drop NaN values.
    my_array = np.array(my_array, copy=False, dtype=np.float128)
    my_array = my_array[~np.isnan(my_array)]

    return my_array[(np.abs(my_array - value)).argmin()]

Example which throws an error:

find_nearest([0.0, 1.0], 1.8446744073709556e+19)

Throws: 0.0, but the correct answer is 1.0.

If I cannot throw the correct answer, at least I would like to be able to throw an exception. The problem is that now I do not know how to identify bad inputs. A more general answer that would fit other cases is preferable, as I see this as a recurring issue.

Newskooler
  • 3,973
  • 7
  • 46
  • 84

2 Answers2

1

Beware, float128 isn't actually 128 bit precision! It's in fact a longdouble implementation: https://en.wikipedia.org/wiki/Extended_precision. The precision of this type of storage is 63 bits - this is why it fails around 1e+19, because that's 63 binary bits for you. Of course, if the differences in your array is more than 1, it will be able to distinguish that on that number, it simply means that whatever difference you're trying to make it distinguish must be larger than 1/2**63 of your input value.

What is the internal precision of numpy.float128? Here's an old answer that elaborate the same thing. I've done my test and have confirmed that np.float128 is exactly a longdouble with 63 bits of precision.

I suggest you set a maximum for value, and if your value is larger than that, either:

  1. reduce the value to that number, on the premise that everything in your array is going to be smaller than that number.

  2. Throw an error.

like this:

VALUE_MAX = 1e18
def find_nearest(my_array, value):
    if value > VALUE_MAX:
        value = VALUE_MAX
    ...

Alternatively, you can choose more scientific approach such as actually comparing your value to the maximum of the array:

def find_nearest(my_array, value):
    my_array = np.array(my_array, dtype=np.float128)
    if value > np.amax(my_array):
        value = np.amax(my_array)
    elif value < np.amin(my_array):
        value = np.amin(my_array)
    ...

This way you'll be sure that you never run into this problem - since your value will always be at most as large as the maximum of your array or at minimum as minimum of your array.

Rocky Li
  • 5,641
  • 2
  • 17
  • 33
  • Okay, so on what basis do I decide the upper and lower bound of this `value`? – Newskooler Nov 26 '18 at 15:14
  • Depends on the difference you’re anticipating, if everything in the list has at least 1 as difference, you can set it at 1e18, etc – Rocky Li Nov 26 '18 at 15:15
  • Okay and do you think the `np.float128` is redundant? Also, can you add the `value` information to the answer please? – Newskooler Nov 26 '18 at 15:17
  • It is not, because otherwise you’ll have float64 with 48bits of precision, which is less than 63 with long double. – Rocky Li Nov 26 '18 at 15:19
  • See update: the latter function should fix this problem with no regard to the value of `value` – Rocky Li Nov 26 '18 at 15:29
1

The problem here doesn't seem to be that a float128 can't handle 1.844...e+19, but rather that you probably can't add two floating point numbers with such radically different scales and expect to get accurate results:

In [1]: 1.8446744073709556e+19 - 1.0 == 1.8446744073709556e+19
Out[1]: True

Your best bet, if you really need this amount of accuracy, would be to use Decimal objects and put them into a numpy array as dtype 'object':

In [1]: from decimal import Decimal

In [2]: big_num = Decimal(1.8446744073709556e+19)

In [3]: big_num  # Note the slight innaccuracies due to floating point conversion
Out[3]: Decimal('18446744073709555712')

In [4]: a = np.array([Decimal(0.0), Decimal(1.0)], dtype='object')

In [5]: a[np.abs(a - big_num).argmin()]
Out[5]: Decimal('1')

Note that this will be MUCH slower than typical Numpy operations, because it has to revert to Python for each computation rather than being able to leverage its own optimized libraries (since numpy doesn't have a Decimal type).

EDIT:

If you don't need this solution and just want to know if your current code will fail, I suggest the very scientific approach of "just try":

fails = len(set(my_array)) == len(set(my_array - value))

This makes sure that, when you subtract value and a unique number X in my_array, you get a unique result. This is a generally true fact about subtraction, and if it fails then it's because the floating point arithmetic isn't precise enough to handle value - X as a number distinct from value or X.

scnerd
  • 5,836
  • 2
  • 21
  • 36
  • Okay and what if I don't need such accuracy and would like to throw an error when such cases may occur (i.e. cases where I am performing operation on radically different numbers)? – Newskooler Nov 26 '18 at 15:11
  • btw when `my_array=[0, 0]` and `value=0.0`, I get a `TypeError: unsupported operand type(s) for -: 'list' and 'float'`. – Newskooler Nov 26 '18 at 16:17