Finding indices of non-unique elements in Numpy array

Question

I have found other methods, such as this, to remove duplicate elements from an array. My requirement is slightly different. If I start with:

array([[1, 2, 3],
       [2, 3, 4],
       [1, 2, 3],
       [3, 2, 1],
       [3, 4, 5]])

I would like to end up with:

array([[2, 3, 4],
       [3, 2, 1]
       [3, 4, 5]])

That's what I would ultimately like to end up with, but there is an extra requirement. I would also like to store either an array of indices to discard, or to keep (a la numpy.take).

I am using Numpy 1.8.1

You can count how many time each row appears using methods suggested, for example, [here](http://stackoverflow.com/q/27000092/3923281) and [here](http://stackoverflow.com/q/33786245/3923281). I think that's what your problem here reduces to. — Alex Riley, Dec 06 '15 at 21:51
@ajcr I can't use `return_counts` so #1 is out for me. Unfortunately #2 seems to require sorted array, and I need to preserve the order. — codedog, Dec 06 '15 at 22:35
@codedog Were either of the answers helpful? If not, could you let us know what else you're looking for, — ilyas patanam, Dec 08 '15 at 06:10

score 1 · Accepted Answer · edited May 23 '17 at 12:24

We want to find rows which are not duplicated in your array, while preserving the order.

I use this solution to combine each row of a into a single element, so that we can find the unique rows using np.unique(,return_index=True, return_inverse= True). Then, I modified this function to output the counts of the unique rows using the index and inverse. From there, I can select all unique rows which have counts == 1.

a = np.array([[1, 2, 3],
       [2, 3, 4],
       [1, 2, 3],
       [3, 2, 1],
       [3, 4, 5]])

#use a flexible data type, np.void, to combine the columns of `a`
#size of np.void is the number of bytes for an element in `a` multiplied by number of columns
b = a.view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, index, inv = np.unique(b, return_index = True, return_inverse = True)

def return_counts(index, inv):
    count = np.zeros(len(index), np.int)
    np.add.at(count, inv, 1)
    return count

counts = return_counts(index, inv)

#if you want the indices to discard replace with: counts[i] > 1
index_keep = [i for i, j in enumerate(index) if counts[i] == 1]

>>>a[index_keep]
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

#if you don't need the indices and just want the array returned while preserving the order
a_unique = np.vstack(a[idx] for i, idx in enumerate(index) if counts[i] == 1])
>>>a_unique
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

For np.version >= 1.9

b = a.view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, index, counts = np.unique(b, return_index = True, return_counts = True)

index_keep = [i for i, j in enumerate(index) if counts[i] == 1]
>>>a[index_keep]
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

except the request was to exclude [1,2,3] since it occurs more than once — , Dec 07 '15 at 04:18
@Dan Patterson Thanks for pointing it out, I have edited my solution. — ilyas patanam, Dec 07 '15 at 17:31
Yesterday I found out we have implemented this in a C extension. I have not tested this solution explicitly but it looks very similar to what has been implemented here. That's why I accepted it as a solution. Thanks. — codedog, Dec 08 '15 at 22:42

s.ouchene · Answer 2 · 2019-11-06T22:56:29.377

You can proceed as follows:

# Assuming your array is a
uniq, uniq_idx, counts = np.unique(a, axis=0, return_index=True, return_counts=True)

# to return the array you want
new_arr = uniq[counts == 1]

# The indices of non-unique rows
a_idx = np.arange(a.shape[0]) # the indices of array a
nuniq_idx = a_idx[np.in1d(a_idx, uniq_idx[counts==1], invert=True)]

You get:

#new_arr
array([[2, 3, 4],
       [3, 2, 1],
       [3, 4, 5]])

# nuniq_idx
array([0, 2])

score 0 · Answer 3 · answered Dec 07 '15 at 09:19

If you want to delete all instances of the elements, that exists in duplicate versions, you could iterate through the array, find the indexes of elements existing in more than one version, and lastly delete these:

# The array to check:
array = numpy.array([[1, 2, 3],
        [2, 3, 4],
        [1, 2, 3],
        [3, 2, 1],
        [3, 4, 5]])

# List that contains the indices of duplicates (which should be deleted)
deleteIndices = []

for i in range(0,len(array)): # Loop through entire array
    indices = range(0,len(array)) # All indices in array
    del indices[i] # All indices in array, except the i'th element currently being checked

for j in indexes: # Loop through every other element in array, except the i'th element, currently being checked
    if(array[i] == array[j]).all(): # Check if element being checked is equal to the j'th element
        deleteIndices.append(j) # If i'th and j'th element are equal, j is appended to deleteIndices[]

# Sort deleteIndices in ascending order:
deleteIndices.sort()

# Delete duplicates
array = numpy.delete(array,deleteIndices,axis=0)

This outputs:

>>> array
array([[2, 3, 4],
       [3, 2, 1],
       [3, 4, 5]])

>>> deleteIndices
[0, 2]

Like that you both delete the duplicates and get a list of indices to discard.

score 0 · Answer 4 · answered Apr 02 '16 at 19:36

0

The numpy_indexed package (disclaimer: I am its author) can be used to solve such problems in a vectorized manner:

index = npi.as_index(arr)
keep = index.count == 1
discard = np.invert(keep)
print(index.unique[keep])

answered Apr 02 '16 at 19:36

Eelco Hoogendoorn

10,459
1
44
42

Finding indices of non-unique elements in Numpy array

4 Answers4