0

How can I get the unique rows of an array while preserving the order (of first appearance) of the rows in the result?

The below code tried with variations resulting in a single array.
array_equal compares with element positions.

import numpy as np
unique = np.array([])
arr = np.array([[1,2,3,4,5],[3,4,5,6,7],[5,6,7,8,9],[7,8,9,0,1],[9,0,1,2,3],[1,2,3,4,5],[-8,-7,-6,-5,-4]])

u = 0
for idx, i in enumerate(arr):
       if np.array_equal(unique, i) == False:
        unique = np.append(unique, i, axis=None)
        u += 1

print (unique)
print(u)

>>> print (unique)
[ 1.  2.  3.  4.  5.  3.  4.  5.  6.  7.  5.  6.  7.  8.  9.  7.  8.  9.
  0.  1.  9.  0.  1.  2.  3.  1.  2.  3.  4.  5. -8. -7. -6. -5. -4.]
>>> print(u)
7
>>>

For this example, the expected result is an array with 6 unique rows.

[[1,2,3,4,5],[3,4,5,6,7],[5,6,7,8,9],[7,8,9,0,1],[9,0,1,2,3],[-8,-7,-6,-5,-4]]
jared
  • 4,165
  • 1
  • 8
  • 31
Majoris
  • 2,963
  • 6
  • 47
  • 81
  • np.append clearly states that with axis None, it flattens everything! In general `np.append` in a loop is not a good idea. Stick with list append if you much iterate. – hpaulj Aug 18 '23 at 00:32
  • @hpaulj Tried axis 1 `numpy.AxisError: axis 1 is out of bounds for array of dimension 1` – Majoris Aug 18 '23 at 00:35
  • Well, duh. The array (both) has to be 2d to use axis=1. Don't try random fixes! If you don't understand dimensions, and `np.concatenate`, `np.append` will just get you into trouble. – hpaulj Aug 18 '23 at 01:31
  • `unique` works by sorting, so that duplicates are next to each other. Finding duplicates without that sorting takes a lot more work. – hpaulj Aug 18 '23 at 03:25

1 Answers1

1

This can be done by simply passing the axis=0 argument to np.unique, telling numpy to compare the rows to each other. To preserve the order you can follow this answer, whereby you get the unique indices and use those (once sorted) to select the unique rows of arr.

import numpy as np

arr = np.array([[ 1,  2,  3,  4,  5],
                [ 3,  4,  5,  6,  7],
                [ 5,  6,  7,  8,  9],
                [ 7,  8,  9,  0,  1],
                [ 9,  0,  1,  2,  3],
                [ 1,  2,  3,  4,  5],
                [-8, -7, -6, -5, -4]])
_, idx = np.unique(arr, axis=0, return_index=True)
unique = arr[np.sort(idx)]
print(unique)

Output:

[[ 1  2  3  4  5]
 [ 3  4  5  6  7]
 [ 5  6  7  8  9]
 [ 7  8  9  0  1]
 [ 9  0  1  2  3]
 [-8 -7 -6 -5 -4]]
jared
  • 4,165
  • 1
  • 8
  • 31