max() and min() returns np.nan when the np array starts with np.nan

Question

The max() and min() functions are both returning np.nan if I have an array that starts with np.nan

Here is where it works as expected:

>>> column_data = np.array([111, np.nan, 112, np.nan, 115, np.nan, 116, np.nan, 117, np.nan, 118, np.nan, 119])
>>> print(max(column_data))
119.0
>>> print(min(column_data))
111.0

Now I add an np.nan at the beginning of the array, and it screwed up

>>> column_data = np.array([np.nan, 111, np.nan, 112, np.nan, 115, np.nan, 116, np.nan, 117, np.nan, 118, np.nan, 119])
>>> print(max(column_data))
nan
>>> print(min(column_data))
nan

I've tried filtering out the nan elements, but still the same:

>>> print(max(i for i in column_data if i is not np.nan))
nan
>>> print(min(i for i in column_data if i is not np.nan))
nan

What happened here and how do I fix this?

for max() to give me 119 and min() to give me 111, like my first example where it was working fine — Dan, Jan 14 '20 at 06:35

U13-Forward · Accepted Answer · 2020-01-14T06:51:50.557

10

Solution and a bit of explanation:

@user2357112supportsMonica proves a point, "The filter is failing because the objects retrieved from the array to represent the NaN value are different objects from np.nan":

print(np.nanmin(column_data))
print(np.nanmax(column_data))

Output:

111.0
119.0

See inequality comparison of numpy array with nan to a scalar for more info.

Documentation:

As mentioned in the documentation's notes:

NaN values are propagated, that is if at least one item is NaN, the corresponding max value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmax.

edited Jan 14 '20 at 06:51

answered Jan 14 '20 at 06:35

U13-Forward

69,221
14
89
114

The fact that `np.nan != np.nan` has nothing to do with this. Even if things were changed so that `np.nan == np.nan`, the results would be the same. The filter is failing because the objects retrieved from the array to represent the NaN value are different objects from `np.nan`. – user2357112 Jan 14 '20 at 06:39
Thank you, this works! Also would you know why it fails to work when the array starts with np.nan but still works well when it does not? – Dan Jan 14 '20 at 06:40
2

@Dan this is due to the way comparisons to `nan` values are handled. See: https://stackoverflow.com/questions/25345843/inequality-comparison-of-numpy-array-with-nan-to-a-scalar – sshashank124 Jan 14 '20 at 06:42
@Dan Added documentation link – U13-Forward Jan 14 '20 at 06:55

sshashank124 · Answer 2 · 2020-01-14T06:40:37.543

This is because i is not np.nan is never evaluating to False and thus nothing is every getting filtered out. The correct way to test for nan is using np.isnan(...). This should work correctly:

max(i for i in column_data if not np.isnan(i))

Also, you can use numpy methods for performing filtering, max and min as follows:

column_data[~np.isnan(column_data)].max()

However, if you only wish to calculate max and min for non-nan values and not do anything else with the non-nan values, @U10-Forward's answer is the better approach.

max() and min() returns np.nan when the np array starts with np.nan

2 Answers2