0

I've got a numpy array that contains some numbers and strings in separate columns:

a = np.array( [[ 3e-05, 'A' ],
[ 2, 'B' ],
[ 1e-05, 'C' ]]
)

print(a[a[:, 0].argsort()])

However, when try to sort it based on the first column using .argsort() it's sorted in string order not numeric order.

[['1e-05' 'C']
 ['2' 'B']
 ['3e-05' 'A']]

How do I go about getting the array to sort in numeric order based on the first column?

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
John Westlund
  • 336
  • 1
  • 10
  • Does this answer your question? [Sorting arrays in NumPy by column](https://stackoverflow.com/questions/2828059/sorting-arrays-in-numpy-by-column) – Carlos Horn Jan 04 '23 at 10:21
  • 1
    While your list of lists contains numbers and strings, the array you made from it is just strings That should be clear from the sorted output. To get a numeric sort, you need numbers, not just strings that look like numbers. Have you considered using the Python sort with key – hpaulj Jan 04 '23 at 16:38
  • @CarlosHorn Not quite -- that solution works if none of the numbers in the array are in e-notation. – John Westlund Jan 05 '23 at 02:21
  • 1
    I edited your title because the key here is that the numpy array was created using floats and strings, and was converted to an array of strings. BTW "e-notation" is nothing special. It just denotes a regular `float` `a*(10**b)` as `aEb`. The numbers themselves are still the same floating-point numbers. – Pranav Hosangadi Jan 05 '23 at 02:34
  • If you are not forced to use numpy, I would recommend to use pandas which IMHO is a better choice for representing data of various types. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html#pandas-dataframe-sort-values to solve your sorting problem. – Carlos Horn Jan 05 '23 at 08:16

2 Answers2

3

In this case, a is an array of strings, as evidenced by a.dtype being '<U32'. Therefore, a[:, 0].argsort() will sort the column in lexical order.

To sort a column as numbers, it needs to be converted to numbers first, by calling .astype before .argsort:

a = np.array( [[ 3e-05, 'A' ],
[ 2, 'B' ],
[ 1e-05, 'C' ]]
)

print(a[a[:, 0].astype(float).argsort()])

Output:

[['1e-05' 'C']
 ['3e-05' 'A']
 ['2' 'B']]
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
John Westlund
  • 336
  • 1
  • 10
1

If you have control over the creation of the array, you could create a structured array instead of a regular array.

dtypes = [('value', np.float64), ('label', '<U32')]

a = np.array( [( 3e-05, 'A' ),
               ( 2, 'B' ),
               ( 1e-05, 'C' )], dtype=dtypes)

Now, a is a structured array with separate dtypes for the first and second columns -- the first column is an array of floats, and the second column is an array of strings.

Note that the array is defined as a list of tuples. This is important: defining it as a list of lists and then specifying dtype=dtypes won't work.

Now, you can sort by a column like so:

a_sorted = np.sort(a, order=['value'])

which gives:

array([(1.e-05, 'C'), (3.e-05, 'A'), (2.e+00, 'B')],
      dtype=[('value', '<f8'), ('label', '<U32')])

You can get a row or column of this structured array like so:

>>> a_sorted[0]
(1.e-05, 'C')

>>> a_sorted['value']
array([1.e-05, 3.e-05, 2.e+00])
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70