why numpy max function(np.max) return wrong output?

Question

I have pandas DataFrame and I turn it to numpy ndarray.I use max function for one column in my DataFrame like this:

print('column: ',df[:,3])
print('max: ',np.max(df[:,3]))

And the output was:

column: [0.6559999999999999 0.48200000000000004 0.9990000000000001 ..., 1.64 nan 0.07]
max: 0.07

But as you can see for example first value is greater than 0.07!! What is the problem?

Can you maybe store the df[:,3] into a separate dataframe and find its max? For me, it seems to be working — Abhimanyu Shekhawat, Aug 28 '20 at 12:08
Yeah, 3 column will also do fine, basically minimum data for me to reproduce this on my end. — Abhimanyu Shekhawat, Aug 28 '20 at 12:24
Are you sure that you are properly making an array from dataframe? np.max(df.values[:,3]) — Stas Buzuluk, Aug 28 '20 at 12:28
If you can show a small and complete sample of your code, it will be easier to debug. — Abhimanyu Shekhawat, Aug 28 '20 at 12:33
Hm. Ok, that starts to be interesting... Can you please provide a minimum reproducible example, since I am unable to reproduce the example you provided. — Stas Buzuluk, Aug 28 '20 at 12:34
You also can use df.iloc[3].max() to substitute using numpys function if that suits your needs. — Stas Buzuluk, Aug 28 '20 at 12:38
I don't know how to upload data to be usable for you because my dataframe has 119400 row and 7 column — Mohammad Sadra Sharifzadeh, Aug 28 '20 at 12:40
I've edited my answer so it should suits better. If you want to read more about converting columns data types: https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas — Stas Buzuluk, Sep 01 '20 at 12:31

Stas Buzuluk · Accepted Answer · 2021-03-01T13:29:13.220

There are two problems here

It looks like column you are trying to find maximum for has the data type object. It's not recommended if you are sure that your column should contain numerical data since it may cause unpredictable behaviour not only in this particular case. Please check data types for your dataframe(you can do this by typing df.dtypes) and change it so that it corresponds to data you expect(for this case df[column_name].astype(np.float64)). This is also the reason for np.nanmax not working properly.
You don't want to use np.max on arrays, containing nans.

Solution

If you are sure about having object data type of column:

1.1. You can use the max method of Series, it should cast data to float automatically.

df.iloc[3].max()

1.2. You can cast data to propper type only for nanmax function.

np.nanmax(df.values[:,3].astype(np.float64)

1.3 You can drop all nan's from dataframe and find max[not recommended]:
```
np.max(test_data[column_name].dropna().values)
```

If type of your data is float64 and it shouldn't be object data type [recommended]:

df[column_name] = df[column_name].astype(np.float64)

np.nanmax(df.values[:,3])

Code to illustrate problem

#python
import pandas as pd
import numpy as np 

test_data = pd.DataFrame({
                   'objects_column': np.array([0.7,0.5,1.0,1.64,np.nan,0.07]).astype(object),
                   'floats_column': np.array([0.7,0.5,1.0,1.64,np.nan,0.07]).astype(np.float64)})

print("********Using np.max function********")
print("Max of objects array:", np.max(test_data['objects_column'].values))
print("Max of floats array:", np.max(test_data['floats_column'].values))

print("\n********Using max method of series function********")
print("Max of objects array:", test_data["objects_column"].max()) 
print("Max of floats array:", test_data["objects_column"].max())

Returns:

********Using np.max function********
Max of objects array: 0.07
Max of floats array: nan

********Using max method of series function********
Max of objects array: 1.64
Max of floats array: 1.64

score 1 · Answer 2 · answered Aug 28 '20 at 12:30

1

np.max is an alias for the function np.amax which according to documentation doesn't play well with NaN values. In order to ignore NaN values you should use np.nanmax instead

answered Aug 28 '20 at 12:30

jovany merham

91
1
8

That's a good assumption but not a correct answer. It looks like a real problem was related to an improper data type. As specified in numpy.amax documentation in case if there's nan in array - amax returns nan, which is not the case in this situation. https://numpy.org/doc/stable/reference/generated/numpy.amax.html – Stas Buzuluk Aug 31 '20 at 13:54
There's a discussion that extends question a little bit: https://chat.stackoverflow.com/rooms/220618/discussion-between-mohammad-sadra-sharifzadeh-and-stas-buzuluk – Stas Buzuluk Aug 31 '20 at 14:17

why numpy max function(np.max) return wrong output?

2 Answers2

There are two problems here

Solution

Code to illustrate problem