1

I am trying to find a min value from one dimensional numpy array which which looks like:

col = array(['6.7', '0.9', '1.3', '4', '1.8'],dtype='|S7'), 

using col.min(), which is not working.

I tried as suggested on NumPy: get min/max from record array of numeric values view function, it failed to recognize 'S7' as valid field.

What is the best way to deal with this problem? Should I have specified the data type while reading the values or while using the min function?

Community
  • 1
  • 1
mrig
  • 382
  • 1
  • 4
  • 21

3 Answers3

4

The problem is that you have an array of strings, not an array of numbers. You therefore need to convert the array to an appropriate type first:

In [38]: col.astype(np.float64).min()
Out[38]: 0.90000000000000002

should I have specified the data type while reading the values or while using the min function

If you know the input to be numeric, it would make sense to specify the data type when reading the data.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • thanks. that helped. Though I saw some roundup in values which did not happened when using key=float argument as suggested by @mgilson. – mrig Jan 30 '13 at 19:30
2

An alternative is to use the python builtin min function in conjunction with the key keyword:

>>> import numpy as np
>>> col = np.asarray(['6.7', '0.9', '1.3', '4', '1.8'])
>>> min(col,key=float)
'0.9'
mgilson
  • 300,191
  • 65
  • 633
  • 696
0

If you won't need to do many other numerical operations and you have a reason for preferring the data to reside in str format, you can always use the native Python min and max operating on a plain list of your data:

In [98]: col = np.asarray(['6.7', '0.9', '1.3', '4', '1.8'])

In [99]: col
Out[99]:
array(['6.7', '0.9', '1.3', '4', '1.8'],
      dtype='|S3')

In [100]: col.min()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-100-1ce0c6ec1def> in <module>()
----> 1 col.min()

TypeError: cannot perform reduce with flexible type

In [101]: col.tolist()
Out[101]: ['6.7', '0.9', '1.3', '4', '1.8']

In [102]: min(col.tolist())
Out[102]: '0.9'

In [103]: max(col.tolist())
Out[103]: '6.7'

In general, this isn't a good way to handle numerical data and could be susceptible to many faulty assumptions about what resides in your array. But it's just another option to consider if you need to or if you have a special reason for working with strings (such as, you're only ever calculating the min and max and all you do with them is display them).

ely
  • 74,674
  • 34
  • 147
  • 228
  • You're making some pretty strong assumptions about the strings here: `min('00.0','0.9')` doesn't give what you'd want... – mgilson Jan 30 '13 at 19:15
  • Also, you shouldn't need to convert to a list in order to use the builtin `min` function. I'd be shocked if it couldn't iterate over a numpy array properly. – mgilson Jan 30 '13 at 19:17