1

The issues

So I have an array I imported containing values ranging from ~0.0 to ~0.76. When I started trying to find the min & max values using Numpy, I ran into some strange inconsistencies that I'd like know how to solve if they're my fault, or avoid if they're programming errors on the Numpy developer's end.

The code

Let's start with finding the location of the maximum values using np.max & np.where.

print array.shape
print np.max(array)
print np.where(array == 0.763728955743)
print np.where(array == np.max(array))
print array[35,57]

The output is this:

(74, 145)
0.763728955743
(array([], dtype=int64), array([], dtype=int64))
(array([35]), array([57]))
0.763728955743

When I look for where the array exactly equals the maximum entry's value, Numpy doesn't find it. However, when I simply search for the location of the maximum values without specifying what that value is, it works. Note this doesn't happen in np.min.

Now I have a different issue regarding minima.

print array.shape
print np.min(array)
print np.where(array == 0.0)
print np.where(array == np.min(array))
print array[10,25], array[31,131]

Look at the returns.

(74, 145)
0.0
(array([10, 25]), array([ 31, 131]))
(array([10, 25]), array([ 31, 131]))
0.0769331747301 1.54220192172e-09

1.54^-9 is close enough to 0.0 that it seems like it would be the minimum value. But why is a location with the value 0.077 also listed by np.where? That's not even close to 0.0 compared to the other value.

The Questions

Why doesn't np.where seem to work when entering the maximum value of the array, but it does when searching for np.max(array) instead? And why does np.where() mixed with np.min() returns two locations, one of which is definitely not the minimum value?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Cebbie
  • 1,741
  • 6
  • 23
  • 37
  • Might be a dup of - http://stackoverflow.com/questions/40939626/python-numpy-where-will-not-return-a-value-in-the-range – Divakar Dec 08 '16 at 17:53
  • Numpy might choose to represent `1.54e-9` as `0` when printing the value, but that doesn't mean that `1.54e-9` _equals_ `0`... Perhaps try printing `repr(np.min(array))`? – mgilson Dec 08 '16 at 17:57
  • What is the value of `np.max(array) - 0.763728955743`? Presumably it is not zero – Eric Dec 08 '16 at 18:02
  • @Eric, it's -3.00426350464e-13, which is essentially zero. – Cebbie Dec 08 '16 at 18:04
  • @mgilson: The output for repr(np.min(array)) is '0.0' – Cebbie Dec 08 '16 at 18:05
  • @ChristineB: "Essentially" zero is not zero, and `==` does not mean "essentially" equal. "Essentially equal" is [`numpy.isclose`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.isclose.html). – user2357112 Dec 08 '16 at 18:08
  • @ChristineB -- "which is essentially zero" ... Not it isn't :-). It's just _close_ to zero. And equality check (which is what numpy is doing) will never say that close values are _equal_. Have a look at [`np.isclose`](https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html) if you want to do a comparison of things that are _almost_ equal. – mgilson Dec 08 '16 at 18:09
  • 1
    As for your issue with minima, you're looking at the wrong cells. You want `array[10, 31]` and `array[25, 131]`, not `array[10, 25]` and `array[31, 131]`. (Also, I'd recommend against calling your arrays `array`.) – user2357112 Dec 08 '16 at 18:11
  • @user2357112: Thanks, both those arrays have values that exactly = 0.0. That's an awfully weird way to output array locations though. And np.isclose sort of answers my first question, I guess. Must be due to the way the program stores numbers where some very small decimal digit isn't shown but makes the inequality false. – Cebbie Dec 08 '16 at 18:16
  • Print `array == 0.763728955743`. Do you see any `True` values? `np.where` just looks at that boolean array, and returns the location of the `Trues`. So the failure here isn't in `where`, it is in `==`. – hpaulj Dec 08 '16 at 18:16
  • _"That's an awfully weird way to output array locations though"_ - it's designed so you can directly use the result for indexing, as in `x[np.where(x==1)]`. If you don't need this, `np.argwhere` outputs in a less weird format – Eric Dec 08 '16 at 19:48

2 Answers2

6

You have two issues: the interpretation of floats and the interpretation of the results of np.where.

  1. Non-integer floating point numbers are stored internally in binary and can not always be represented exactly in decimal notation. Similarly, decimal numbers can not always be represented exactly in binary. This is why np.where(array == 0.763728955743) returns an empty array, while print np.where(array == np.max(array)) does the right thing. Note that the second case just uses the exact binary number internally without any conversions. The search for the minimum succeeds because 0.0 can be represented exactly in both decimal and binary. In general, it is a bad idea to compare floats using == for this and related reasons.
  2. For the version of np.where that you are using, it devolves into np.nonzero. You are interpreting the results here because it returns an array for each dimension of the array, not individual arrays of coordinates. There are a number of ways of saying this differently:

    • If you had three matches, you would be getting two arrays back, each with three elements.
    • If you had a 3D input array with two matches, you would get three arrays back, each with two elements.
    • The first array is row-coordinates (dim 0) and the second array is column-coordinates (dim 1).
    • Notice how you are interpreting the output of where for the maximum case. This is correct, but it is not what you are doing in the minimum case.

There are a number of ways of dealing with these issues. The easiest could be to use np.argmax and np.argmin. These will return the first coordinate of a maximum or minimum in the array, respectively.

>>> x = np.argmax(array)
>>> print(x)
array([35, 57])
>> print(array[x])
0.763728955743

The only possible problem here is that you may want to get all of the coordinates.

In that case, using where, or nonzero is fine. The only difference from your code is that you should print

print array[10,31], array[25,131]

instead of the transposed values as you are doing.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
0

Try using numpy.isclose() instead of ==. Because floating point numbers cannot be tested for exact equality.

i.e. change this: np.where(array == 0.763728955743) to: np.isclose(array, 0.763728955743)

np.min() and np.max() work as expected for me. Also note you can provide an axis like arr.min(axis=1) if you want to.

If this does not solve it, perhaps you could post some csv data somewhere to try to reproduce the problem? I kinda highly doubt it is a bug with numpy itself but you never know!

Alex G Rice
  • 1,561
  • 11
  • 16