1

If I have an array

arr = [[0,1]
       [1,2]
       [2,3]
       [4,3]
       [5,6]
       [3,4]
       [2,1]
       [6,7]]

how could I eliminate redundant rows where columns values may be swapped? In the example above, the code would reduce the array to

arr = [[0,1]
       [1,2]
       [2,3]
       [4,3]
       [5,6]
       [6,7]]

I have thought about using a combination of slicing arr[:,::-1, np.all, and np.any, but what I have come up so far simply gives me True and False per row when comparing rows but this wouldn't discriminate between similar rows.

j = np.any([np.all(y==x, axis=1) for y in x[:,::-1]], axis=0)

which yields [False, True, False, True, False, True, True, False].

Thanks in advance.

Tim
  • 107
  • 1
  • 10

4 Answers4

2

Basically you want to Find Unique Rows, and these answers borrow heavily from the top two answers there - but you need to sort the rows first to eliminate different orders.

If you don't care about order of rows at the end, this is the short way (but slower than below):

np.vstack({tuple(row) for row in np.sort(arr,-1)})

If you do want to maintain order, you can turn each sorted row into a void object and use np.unique with return_index

b = np.ascontiguousarray(np.sort(arr,-1)).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1])))
_, idx = np.unique(b, return_index=True)

unique_arr = arr[idx]

It might be tempting to use set row-wise instead of using np.sort(arr,-1) and np.void to make an object array, but this only works if there are no repeated values in rows. If there are, a row of [1,2,2] will be considered equivalent to a row with [1,1,2] - both will be set(1,2)

Daniel F
  • 13,620
  • 2
  • 29
  • 55
  • I had noticed the suggested post, but I was unsure about how it might work for my solution. I did have to modify your code a little because `arr.sort` does an internal sort; if we change this to `np.sort(arr, -1)`, a new array is output and it works great. – Tim Jul 10 '17 at 08:02
  • 1
    Edited answer to reflect this your comment, @Tim – Daniel F Jul 10 '17 at 08:58
1

After getting the boolean list, you can use the folllowing technique to obtain the list with values where x and y are swapped.

In order to remove same rows, you can use the following block

#This block to remove elements where x and y are swapped provided the list j
j=[True,False..] #Your Boolean List
finalArray=[]
for (bool,value) in zip(j,arr):
    if not bool:
      finalArray.append(value)


#This code to remove same elements
finalArray= [list(x) for x in set(tuple(x) for x in arr)]
1

A solution without using numpy,

In [27]: result_ = set(([tuple(sorted(row)) for row in arr]))

In [28]: result = [list(i) for i in result_]

In [29]: result
Out[29]: [[0, 1], [1, 2], [6, 7], [5, 6], [2, 3], [3, 4]]
Rahul K P
  • 15,740
  • 4
  • 35
  • 52
1

The solution using numpy.lexsort routine:

import numpy as np

arr = np.array([
    [0,1], [1,2], [2,3], [4,3], [5,6], [3,4], [2,1], [6,7]
])
order = np.lexsort(arr.T)
a = arr[order]    # sorted rows
arr= a[[i for i,r in enumerate(a) if i == len(a)-1 or set(a[i]) != set(a[i+1])]]

print(arr)

The output:

[[0 1]
 [1 2]
 [2 3]
 [3 4]
 [5 6]
 [6 7]]
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105