Numpy Array Set Difference

Question

I have two numpy arrays that have overlapping rows:

import numpy as np

a = np.array([[1,2], [1,5], [3,4], [3,5], [4,1], [4,6]])
b = np.array([[1,5], [3,4], [4,6]])

You can assume that:

the rows are sorted
the rows within each array is unique
array b is always subset of array a

I would like to get an array that contains all rows of a that are not in b.

i.e.,:

[[1 2]
 [3 5]
 [4 1]]

Considering that a and b can be very, very large, what is the most efficient method for solving this problem?

You mention the rows are sorted. Is the full array also sorted column-wise? — mtrw, Sep 04 '16 at 21:56
Other recent row set questions: (intersection) http://stackoverflow.com/questions/39218768/find-numpy-vectors-in-a-set-quickly/39220519#39220519, (union) http://stackoverflow.com/questions/39083549/python-2-d-array-get-the-function-as-np-unique-or-union1d — hpaulj, Sep 04 '16 at 21:57
Padraic - I think there are better duplicates than that. It dates from 2012, and there have been many questions about row sets or unique rows since then. — hpaulj, Sep 04 '16 at 22:20
@hpaulj, feel from to reopen and re-dupe but if you look at the answer below it seems to be almost a literal copy of this highest rated answer http://stackoverflow.com/a/11903368/2141635 from the dupe. — Padraic Cunningham, Sep 05 '16 at 00:45

score 7 · Answer 1 · answered Sep 04 '16 at 21:27

7

Here's a possible solution to your problem:

import numpy as np

a = np.array([[1, 2], [3, 4], [3, 5], [4, 1], [4, 6]])
b = np.array([[3, 4], [4, 6]])

a1_rows = a.view([('', a.dtype)] * a.shape[1])
a2_rows = b.view([('', b.dtype)] * b.shape[1])
c = np.setdiff1d(a1_rows, a2_rows).view(a.dtype).reshape(-1, a.shape[1])
print c

I think using numpy.setdiff1d is the right choice here

answered Sep 04 '16 at 21:27

BPL

9,632
9
59
117

1

Use the `assume_unique=True` parameter if applicable. Also look at the code for this function and `np.in1d`. It might give ideas on how do things faster. Other `row set` questions have proposed other ways of converting `a` and `b` to 1d for use in these functions. – hpaulj Sep 05 '16 at 16:54

Numpy Array Set Difference

1 Answers1

Linked