Numpy: how delete rows common to 2 matrices

Question

problem is very simple: I have two 2d np.array and I want to get a third array that only contains the rows that are not in common with the latter twos.

for example:

X = np.array([[0,1],[1,2],[4,5],[5,6],[8,9],[9,10]])
Y = np.array([[5,6],[9,10]])

Z = function(X,Y)
Z = array([[0, 1],
          [1, 2],
          [4, 5],
          [8, 9]])

I tried np.delete(X,Y,axis=0) but it doesn't work...

score 2 · Accepted Answer · answered Apr 23 '17 at 14:03

2

Z = np.vstack(row for row in X if row not in Y)

answered Apr 23 '17 at 14:03

Luchko

1,123
7
15

Note that this solution requires a number of operations equal to the product of the size of both sets, which is far from ideal. – Eelco Hoogendoorn Apr 23 '17 at 18:51
@EelcoHoogendoorn, might be.. nevertheless its almost 10 times faster than ``Z = npi.difference(X, Y)``.. you can check by yourself :) – Luchko Apr 24 '17 at 01:46
For the tiny example array of the OP, I don't doubt it at all – Eelco Hoogendoorn Apr 24 '17 at 04:58

score 1 · Answer 2 · answered Apr 23 '17 at 18:49

1

The numpy_indexed package (disclaimer: I am its author) extends the standard numpy array set operations to multi-dimensional use cases such as these, with good efficiency:

import numpy_indexed as npi
Z = npi.difference(X, Y)

answered Apr 23 '17 at 18:49

Eelco Hoogendoorn

10,459
1
44
42

score 0 · Answer 3 · answered Apr 23 '17 at 14:19

Here's a views based approach -

# Based on http://stackoverflow.com/a/41417343/3293881 by @Eric
def setdiff2d(a, b):
    # check that casting to void will create equal size elements
    assert a.shape[1:] == b.shape[1:]
    assert a.dtype == b.dtype

    # compute dtypes
    void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
    orig_dt = np.dtype((a.dtype, a.shape[1:]))

    # convert to 1d void arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    a_void = a.reshape(a.shape[0], -1).view(void_dt)
    b_void = b.reshape(b.shape[0], -1).view(void_dt)

    # Get indices in a that are also in b
    return np.setdiff1d(a_void, b_void).view(orig_dt)

Sample run -

In [81]: X
Out[81]: 
array([[ 0,  1],
       [ 1,  2],
       [ 4,  5],
       [ 5,  6],
       [ 8,  9],
       [ 9, 10]])

In [82]: Y
Out[82]: 
array([[ 5,  6],
       [ 9, 10]])

In [83]: setdiff2d(X,Y)
Out[83]: 
array([[0, 1],
       [1, 2],
       [4, 5],
       [8, 9]])

score -1 · Answer 4 · answered Apr 23 '17 at 14:09

-1

Z = np.unique([tuple(row) for row in X + Y])

answered Apr 23 '17 at 14:09

Reaper

747
1
5
15

Numpy: how delete rows common to 2 matrices

4 Answers4