2

I simply need to remove the rows that are repeated in an array but maintain one of them, I can't use unique because I need to maintain the order. Example

1 a234 125
1 a123 265
1 a234 125
1 a145 167
1 a234 125    
2 a189 547
2 a189 547    
3 a678 567
3 a357 569

I need this output

1 a234 125
1 a123 265
1 a145 167    
2 a189 547
3 a678 567
3 a357 569
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Alice
  • 191
  • 2
  • 13

1 Answers1

5

I think this does what you want ,and uses np.unique with the return_index keyword argument:

import numpy as np

a = np.array([[1, 'a234', 125],
              [2, 'b189', 547],
              [1, 'a234', 125],
              [3, 'c678', 567],
              [1, 'a234', 125],
              [2, 'b189', 547]])

b = a.ravel().view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))
_, unique_idx = np.unique(b, return_index=True)

new_a = a[np.sort(unique_idx)]

>>> new_a
array([['1', 'a234', '125'],
       ['2', 'b189', '547'],
       ['3', 'c678', '567']], 
      dtype='|S4')

The hackiest part is the view b, that turns each row into a single element of np.void dtype, so that full rows can be compared for equality by np.unique.

Jaime
  • 65,696
  • 17
  • 124
  • 159
  • @Ali_Sce You're not supposed to include the `>>>` and everything thereafter in your code. – sodd May 23 '13 at 16:29
  • I didn't do that... I am a beginner but I try to understand things ;) – Alice May 23 '13 at 16:34
  • 1
    When I run it I need to set `new_a = a[np.sort(unique_idx[1])]` otherwise `unique_idx` would be a tuple of two arrays, for the rest its perfect! Really hacky, at least for me! – Alice May 23 '13 at 16:44