Numpy delete repeated rows

Question

I simply need to remove the rows that are repeated in an array but maintain one of them, I can't use unique because I need to maintain the order. Example

I need this output

Are your rows distinguishable by one, say the first, column entry? — Jan, May 23 '13 at 15:46
See question regarding this [here](http://stackoverflow.com/questions/12926898/numpy-unique-without-sort). — sodd, May 23 '13 at 16:03

score 5 · Accepted Answer · answered May 23 '13 at 16:13

5

I think this does what you want ,and uses np.unique with the return_index keyword argument:

import numpy as np

a = np.array([[1, 'a234', 125],
              [2, 'b189', 547],
              [1, 'a234', 125],
              [3, 'c678', 567],
              [1, 'a234', 125],
              [2, 'b189', 547]])

b = a.ravel().view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))
_, unique_idx = np.unique(b, return_index=True)

new_a = a[np.sort(unique_idx)]

>>> new_a
array([['1', 'a234', '125'],
       ['2', 'b189', '547'],
       ['3', 'c678', '567']], 
      dtype='|S4')

The hackiest part is the view b, that turns each row into a single element of np.void dtype, so that full rows can be compared for equality by np.unique.

answered May 23 '13 at 16:13

Jaime

65,696
17
124
159

@Ali_Sce You're not supposed to include the `>>>` and everything thereafter in your code. – sodd May 23 '13 at 16:29
I didn't do that... I am a beginner but I try to understand things ;) – Alice May 23 '13 at 16:34
1

When I run it I need to set `new_a = a[np.sort(unique_idx[1])]` otherwise `unique_idx` would be a tuple of two arrays, for the rest its perfect! Really hacky, at least for me! – Alice May 23 '13 at 16:44

Numpy delete repeated rows

1 Answers1