Eliminating redundant numpy rows

Question

If I have an array

arr = [[0,1]
       [1,2]
       [2,3]
       [4,3]
       [5,6]
       [3,4]
       [2,1]
       [6,7]]

how could I eliminate redundant rows where columns values may be swapped? In the example above, the code would reduce the array to

arr = [[0,1]
       [1,2]
       [2,3]
       [4,3]
       [5,6]
       [6,7]]

I have thought about using a combination of slicing arr[:,::-1, np.all, and np.any, but what I have come up so far simply gives me True and False per row when comparing rows but this wouldn't discriminate between similar rows.

j = np.any([np.all(y==x, axis=1) for y in x[:,::-1]], axis=0)

which yields [False, True, False, True, False, True, True, False].

Thanks in advance.

Not quite a dupe, as the order can change. – Daniel F Jul 10 '17 at 07:16 — Daniel F, Jul 10 '17 at 07:16

Daniel F · Accepted Answer · 2017-07-10T08:57:56.813

Basically you want to Find Unique Rows, and these answers borrow heavily from the top two answers there - but you need to sort the rows first to eliminate different orders.

If you don't care about order of rows at the end, this is the short way (but slower than below):

np.vstack({tuple(row) for row in np.sort(arr,-1)})

If you do want to maintain order, you can turn each sorted row into a void object and use np.unique with return_index

b = np.ascontiguousarray(np.sort(arr,-1)).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1])))
_, idx = np.unique(b, return_index=True)

unique_arr = arr[idx]

It might be tempting to use set row-wise instead of using np.sort(arr,-1) and np.void to make an object array, but this only works if there are no repeated values in rows. If there are, a row of [1,2,2] will be considered equivalent to a row with [1,1,2] - both will be set(1,2)

I had noticed the suggested post, but I was unsure about how it might work for my solution. I did have to modify your code a little because `arr.sort` does an internal sort; if we change this to `np.sort(arr, -1)`, a new array is output and it works great. — Tim, Jul 10 '17 at 08:02

Shakar Bhattarai · Answer 2 · 2017-07-10T08:12:15.277

1

After getting the boolean list, you can use the folllowing technique to obtain the list with values where x and y are swapped.

In order to remove same rows, you can use the following block

#This block to remove elements where x and y are swapped provided the list j
j=[True,False..] #Your Boolean List
finalArray=[]
for (bool,value) in zip(j,arr):
    if not bool:
      finalArray.append(value)


#This code to remove same elements
finalArray= [list(x) for x in set(tuple(x) for x in arr)]

edited Jul 10 '17 at 08:12

answered Jul 10 '17 at 07:03

Shakar Bhattarai

121
6

I am a little confused by the last line. You changed variables on me. Is the `a`supposed to be `arr`? – Tim Jul 10 '17 at 08:04
1

Yes, sorry about that. Corrected – Shakar Bhattarai Jul 10 '17 at 08:12

score 1 · Answer 3 · answered Jul 10 '17 at 07:28

1

A solution without using numpy,

In [27]: result_ = set(([tuple(sorted(row)) for row in arr]))

In [28]: result = [list(i) for i in result_]

In [29]: result
Out[29]: [[0, 1], [1, 2], [6, 7], [5, 6], [2, 3], [3, 4]]

answered Jul 10 '17 at 07:28

Rahul K P

15,740
4
35
52

score 1 · Answer 4 · answered Jul 10 '17 at 07:41

The solution using numpy.lexsort routine:

import numpy as np

arr = np.array([
    [0,1], [1,2], [2,3], [4,3], [5,6], [3,4], [2,1], [6,7]
])
order = np.lexsort(arr.T)
a = arr[order]    # sorted rows
arr= a[[i for i,r in enumerate(a) if i == len(a)-1 or set(a[i]) != set(a[i+1])]]

print(arr)

The output:

[[0 1]
 [1 2]
 [2 3]
 [3 4]
 [5 6]
 [6 7]]

Eliminating redundant numpy rows

4 Answers4