How to remove duplicated array in array of arrays in a pythonic way?

Question

I want to iterate over an array of arrays and skip to the next array if I've already read the same array. The following code works, but I'm searching a more 'pythonic' style solution.

from sklearn import datasets
import numpy as np
iris = datasets.load_iris()
X = iris.data[:, :2]

read = []
for x in X:
    temp = True
    for r in read:
        if np.array_equal(x, r):
            temp = False
    if temp:
        read.append(x)
        # do some stuff

Type and content of X:

>>> type(X)
<class 'numpy.ndarray'>

>>> X
array([[5.1, 3.5],
   [4.9, 3. ],
   [4.9, 3. ]
   [4.7, 3.2],
   [4.6, 3.1],
   [5. , 3.6],
   ...
   [5.9, 3. ]])

For example, when I read [4.9, 3. ] the first time I do some stuff. When I read [4.9, 3. ] again I skip to the next array.

Where is `read` list used? – Ozzy Walsh Sep 01 '18 at 17:16 — Ozzy Walsh, Sep 01 '18 at 17:16
Sorry, there was an error. – user3420714 Sep 01 '18 at 17:36 — user3420714, Sep 01 '18 at 17:36

score 0 · Accepted Answer · answered Sep 01 '18 at 17:16

You can use numpy.unique along axis=0. To preserve order, you can extract indices, sort them and index your array with the sorted indices. Then just iterate over the result.

Here's a minimal example:

A = np.array([[5.1, 3.5],
              [4.9, 3. ],
              [4.9, 3. ],
              [4.7, 3.2]])

_, idx = np.unique(A, axis=0, return_index=True)

print(A[np.sort(idx)])

array([[5.1, 3.5],
       [4.9, 3. ],
       [4.7, 3.2]])

How to remove duplicated array in array of arrays in a pythonic way?

1 Answers1