3

I have a 3D array as follow, 'b', which I want to represent an array of 2-D array. I want to remove the duplicates of my 2-D arrays and get the unique ones.

>>> a = [[[1, 2], [1, 2]], [[1, 2], [4, 5]], [[1, 2], [1, 2]]]
>>> b = numpy.array(a)
>>> b
array([[[1, 2],
        [1, 2]],

       [[1, 2],
        [4, 5]],

       [[1, 2],
        [1, 2]]])

In this above example, I really want to return the following because there exist one duplicate which I want to remove.

unique = array([[[1, 2],
                 [1, 2]],

                 [[1, 2],
                  [4, 5]])

How should do this with numpy package? Thanks

chen
  • 4,302
  • 6
  • 41
  • 70

4 Answers4

0

See previous answer: Remove duplicate rows of a numpy array convert to array of tuples and then apply np.unique()

Community
  • 1
  • 1
Ian Conway
  • 367
  • 1
  • 2
  • 13
0

Reshape, find the unique rows, then reshape again.

Finding unique tuples by converting to a set.

import numpy as np
a = [[[1, 2], [1, 2]], [[1, 2], [4, 5]], [[1, 2], [1, 2]]]
b = np.array(a)

new_array = [tuple(row) for row in b.reshape(3,4)]
uniques = list(set(new_array))

output = np.array(uniques).reshape(len(uniques), 2, 2)
output

Out[131]: 
array([[[1, 2],
        [1, 2]],

       [[1, 2],
        [4, 5]]])
p-robot
  • 4,652
  • 2
  • 29
  • 38
  • This solution doesn't work: `ValueErrorTraceback (most recent call last) in () 6 uniques = np.unique(new_array) 7 ----> 8 output = uniques.reshape(len(uniques), 2, 2) 9 output ValueError: cannot reshape array of size 4 into shape (4,2,2)` – richar8086 Dec 13 '17 at 11:47
  • Good point. Fixed now to use a `set` to find uniques. Thanks. – p-robot Dec 14 '17 at 17:25
0

Converting to tuple and back again is probably going to be quire expensive, instead you can do a generalized view:

def unique_by_first(a):
    tmp = a.reshape(a.shape[0], -1)
    b = np.ascontiguousarray(tmp).view(np.dtype((np.void, tmp.dtype.itemsize * tmp.shape[1])))
    _, idx = np.unique(b, return_index=True)
    return  a[idx].reshape(-1, *a.shape[1:])

Usage:

print unique_by_first(a) 
[[[1 2]
  [1 2]]

 [[1 2]
  [4 5]]]

Effectively, a generalization of previous answers.

Community
  • 1
  • 1
Daniel
  • 19,179
  • 7
  • 60
  • 74
0

You can convert each such 2D slice off the last two axes into a scalar each by considering them as indices on a multi-dimensional grid. The intention is to map each such slice to a scalar based on their uniqueness. Then, using those scalars, we could use np.unique to keep one instance only.

Thus, an implementation would be -

idx = np.ravel_multi_index(a.reshape(a.shape[0],-1).T,a.max(0).ravel()+1)
out = a[np.sort(np.unique(idx, return_index=1)[1])]

Sample run -

In [43]: a
Out[43]: 
array([[[8, 1],
        [2, 8]],

       [[3, 8],
        [3, 4]],

       [[2, 4],
        [1, 0]],

       [[3, 0],
        [4, 8]],

       [[2, 4],
        [1, 0]],

       [[8, 1],
        [2, 8]]])

In [44]: idx = np.ravel_multi_index(a.reshape(a.shape[0],-1).T,a.max(0).ravel()+1)

In [45]: a[np.sort(np.unique(idx, return_index=1)[1])]
Out[45]: 
array([[[8, 1],
        [2, 8]],

       [[3, 8],
        [3, 4]],

       [[2, 4],
        [1, 0]],

       [[3, 0],
        [4, 8]]])

If you don't mind the order of such slices being maintained, skip the np.sort() at the last step.

Divakar
  • 218,885
  • 19
  • 262
  • 358