Python: remove duplicates from a multi-dimensional array

Question

In Python numpy.unique can remove all duplicates from a 1D array, very efficiently.

1) How about to remove duplicate rows or columns in a 2D array?

2) How about for nD arrays?

can you illustrate what you are trying to achieve with a simple example. — root, Dec 30 '12 at 08:49
@root One case we may use to remove duplicate points (2D or 3D) from a point cloud. — Developer, Dec 30 '12 at 09:07

score 5 · Accepted Answer · answered Dec 30 '12 at 09:08

5

If possible I would use pandas.

In [1]: from pandas import *

In [2]: import numpy as np

In [3]: a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])

In [4]: DataFrame(a).drop_duplicates().values
Out[4]: 
array([[1, 1],
       [2, 3],
       [5, 4]], dtype=int64)

answered Dec 30 '12 at 09:08

root

76,608
25
108
120

`pandas` is not installed yet. Can you give some benchmarks. BTW, input `array` to be `float`s not integers. Try for over 10k points. – Developer Dec 30 '12 at 09:45
2

Well having `pandas` installed now, its performance is outstanding: for 30k points (3D) with duplicates 10k total 40k, only 0.2s. wow! – Developer Dec 30 '12 at 09:59

score 1 · Answer 2 · answered Dec 30 '12 at 10:19

The following is another approach which performs much better than for loop. 2s for 10k+100 duplicates.

def tuples(A):
    try: return tuple(tuples(a) for a in A)
    except TypeError: return A

b = set(tuples(a))

The idea inspired by Waleed Khan's first part. So no need for any additional package that is may have further applications. It is also super Pythonic, I guess.

score 1 · Answer 3 · answered Apr 02 '16 at 19:20

The numpy_indexed package solves this problem for the n-dimensional case. (disclaimer: I am its author). Infact, solving this problem was the motivation for starting this package; but it has grown to include a lot of related functionality.

import numpy_indexed as npi
a = np.random.randint(0, 2, (3, 3, 3))
print(npi.unique(a))
print(npi.unique(a, axis=1))
print(npi.unique(a, axis=2))

Python: remove duplicates from a multi-dimensional array

3 Answers3

Linked