How to remove those rows of matrix A, which have equal values with matrix B in specified columns in Matlab?

Question

I have two matrices in Matlab A and B, which have equal number of columns but different number of rows. The number of rows in B is also less than the number of rows in A. B is actually a subset of A.

How can I remove those rows efficiently from A, where the values in columns 1 and 2 of A are equal to the values in columns 1 and 2 of matrix B?

At the moment I'm doing this:

for k = 1:size(B, 1)
     A(find((A(:,1) == B(k,1) & A(:,2) == B(k,2))), :) = [];
end

and Matlab complains that this is inefficient and that I should try to use any, but I'm not sure how to do it with any. Can someone help me out with this? =)

I tried this, but it doesn't work:

A(any(A(:,1) == B(:,1) & A(:,2) == B(:,2), 2), :) = [];

It complains the following:

Error using  == 
Matrix dimensions must agree.

Example of what I want:

enter image description here

A-B in the results means that the rows of B are removed from A. The same goes with A-C.

`setdiff` is the best solution but to convert your first try to `any` (*keeping* your loop) this is what Matlab is suggesting (you'd actually want `all` and not `any` in your case): `A(all(A == B(k,:),2), :) = [];` — Dan, Jun 19 '14 at 06:36
+1 Thank you very much @Dan I will try all the solutions and post the performance times =) — jjepsuomi, Jun 19 '14 at 06:42
btw I didn't realize you were only comparing the first two columns so update my last comment to `A(all(A(:,1:2) == B(k,1:2),2), :) = [];` — Dan, Jun 19 '14 at 06:52
Thank you everybody for your fine answers =) The original running time (with my data) was: 0.198072 seconds. By using the `bsxfun` approaches I got a running time of approximately 0.007 seconds. By using `setdiff(A(:,1:2),B(:,1:2),'rows')` I got the running time: 0.004120 seconds. — jjepsuomi, Jun 19 '14 at 06:54
@jjepsuomi Hope you can do some benchmarks on bigger datasizes too, would be interesting to see those results too. — Divakar, Jun 19 '14 at 06:57
+1 @Divakar I will try with different data sets and post my results =) It will take few minutes =) — jjepsuomi, Jun 19 '14 at 06:59
@jjepsuomi Added one more `bsxfun` approach in my solution, so do you mind adding that too to your benchmark results? :) — Divakar, Jun 19 '14 at 07:36
Hi @Divakar I added the results for my datasets =) Okay I can add the one more `bsxfun` approach, just a sec =) — jjepsuomi, Jun 19 '14 at 07:46
Hi @Divakar I added your second approach as well =) It seems `setdiff` is beating the heck out of all for some reason (with the dataset I have available). Maybe the results could be different if I had much larger datasets? =) Thank anyway for everybody! =) Your solutions are all very good and the performance differences aren't that big that it would make a difference (at least in my case =)). — jjepsuomi, Jun 19 '14 at 07:56
@jjepsuomi I think the results certainly make sense, because `bsxfun` is known to be memory hungry, so with those huge datasizes, it's bound to get slower. `setdiff` with its definition looks perfect for this problem. Thank you for the results BTW! — Divakar, Jun 19 '14 at 08:27

bla · Accepted Answer · 2014-06-19T07:24:37.670

4

try using setdiff. for example:

c=setdiff(a,b,'rows')

Note, if order is important use:

c = setdiff(a,b,'rows','stable')

Edit: reading the edited question and the comments to this answer, the specific usage of setdiff you look for is (as noticed by Shai):

[temp c] = setdiff(a(:,1:2),b(:,1:2),'rows','stable')
c = a(c,:)

Alternative solution:

you can just use ismember:

a(~ismember(a(:,1:2),b(:,1:2),'rows'),:)

edited Jun 19 '14 at 07:24

answered Jun 19 '14 at 06:28

bla

25,846
10
70
101

3

+1 But don't you need `setdiff(A(:,1:2),B(:,1:2),'rows')` instead? – Divakar Jun 19 '14 at 06:42
1

When I wrote my answer there was an example in the question of two arrays similar to those in the answer that are now edited out. That what I always write: "for example,..." if you understand the answer you can apply it to the question anyway. – bla Jun 19 '14 at 07:00
1

@jjepsuomi Could post back on the screenshot image you had in the post before the edits? – Divakar Jun 19 '14 at 07:03
bygones Divakar :) ... the question was answered 3 times already. – bla Jun 19 '14 at 07:04
@natan I really thought the screenshot made it easier for everyone to understand. – Divakar Jun 19 '14 at 07:06
@Divakar I posted the pic, but there's some problem in the server I think, because it doesn't display it? – jjepsuomi Jun 19 '14 at 07:07
@natan well that would give you first two columns only as `c`. Look into [Shai's solution](http://stackoverflow.com/a/24300103/3293881), it has the correct setdiff implementation using the first two columns, I believe. – Divakar Jun 19 '14 at 07:09
@natan Sorry, it was messy, but for correctness, it was necessary I guess :) I think you can keep it, but just state the assumption that its for all columns and not just first and second column. Upto you! – Divakar Jun 19 '14 at 07:13
1

from all the mess I thought of an alternative solution with `ismember`... :) – bla Jun 19 '14 at 07:17
1

@natan haha way to avoid the mess! Out of +1s :) – Divakar Jun 19 '14 at 07:19

score 2 · Answer 2 · edited May 23 '17 at 12:21

Use bsxfun:

compare = bsxfun( @eq, permute( A(:,1:2), [1 3 2]), permute( B(:,1:2), [3 1 2] ) );
twoEq = all( compare, 3 );
toRemove = any( twoEq, 2 ); 
A( toRemove, : ) = [];

Explaining the code:

First we use bsxfun to compare all pairs of first to column of A and B, resulting with compare of size numRowsA-by-numRowsB-by-2 with true where compare( ii, jj, kk ) = A(ii,kk) == B(jj,kk).
Then we use all to create twoEq of size numRowsA-by-numRowsB where each entry indicates if both corresponding entries of A and B are equal.
Finally, we use any to select rows of A that matches at least one row of B.

What's wrong with original code:

By removing rows of A inside a loop (i.e., A( ... ) = []) you actually resizing A at almost each iteration. See this post on why exactly this is a bad practice.

Using `setdiff`

In order to use setdiff (as suggested by natan) on only the first two columns you'll need use it's second output argument:

[ignore, ia] = setdiff( A(:,1:2), B(:,1:2), 'rows', 'stable' );
A = A( ia, : ); % keeping only relevant rows, beyond first two columns.

Appreciate your effort =) Your answer is great! =) – jjepsuomi Jun 19 '14 at 06:36 — jjepsuomi, Jun 19 '14 at 06:36

score 2 · Answer 3 · edited May 23 '17 at 11:51

2

Here's another bsxfun implementation -

A(~any(squeeze(all(bsxfun(@eq,A(:,1:2),permute(B(:,1:2),[3 2 1])),2)),2),:)

One more that is dangerously close to Shai's solution, but still avoids two permute to one permute -

A(~any(all(bsxfun(@eq,A(:,1:2),permute(B(:,1:2),[3 2 1])),2),3),:)

edited May 23 '17 at 11:51

Community

1
1

answered Jun 19 '14 at 06:39

Divakar

218,885
19
262
358

@natan haha Thanks and likewise here :) – Divakar Jun 19 '14 at 07:23

How to remove those rows of matrix A, which have equal values with matrix B in specified columns in Matlab?

3 Answers3

Alternative solution:

Explaining the code:

What's wrong with original code:

Using setdiff

Using `setdiff`