2

I am looking for non-repeated rows in a matrix.

Assume:

A =

 8     1
 2     2
 2     2
 2     2
 2     2
 3     6
 5     7
 5     7

I would like to get "B" which is:

B=

 8     1
 3     6

Please mind C=unique(A,'rows') will give us unique rows of "A" which include repeated and non-repeated arrays and only remove repetitious rows. It means:

C =

 2     2
 3     6
 5     7
 8     1

"C" is not the one that I am looking for.

Any help would be greatly appreciated!

Iman
  • 412
  • 4
  • 18

2 Answers2

3

Use the second and third outputs of unique as follows:

[~, ii, jj] = unique(A,'rows');
kk = find(histc(jj,unique(jj))==1);
B = A(sort(ii(kk)),:);

Or use this more direct bsxfun-based approach:

B = A(sum(squeeze(all(bsxfun(@eq, A.', permute(A, [2 3 1])))))==1,:);

These two approaches work in quite generally: A may have any number of columns, and may contain non-integer values.


If A always has two columns and contains only integer values, you can also do it with accumarray, using the sparse option (sixth input argument) to save memory in case of large values:

[ii jj] = find(accumarray(A, 1, [], @sum, 0, true)==1);
B = [ii jj];

Or you can use sparse instead of accumarray:

[ii jj] = find(sparse(A(:,1),A(:,2),1)==1);
B = [ii jj];
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Think you can save on the transpose in the bsxfun approach - `A(sum(squeeze(all(bsxfun(@eq, A, permute(A, [3 2 1])),2)))==1,:)` +1 of course for that :) – Divakar May 30 '14 at 17:05
  • @Divakar Thanks! You're right. But saving that `.'` introduces one `,2`. I tend to prefer `.'`. Besides, it may serve to call people's attention on [the meaning of `.'`](http://stackoverflow.com/a/23510668/2586922) – Luis Mendo May 30 '14 at 17:10
  • haha cheeky that one! So trying to avoid sum(..,2), is that a performance thing? Because there must be some overhead for transpose, but is sum(..,2) over sum() greater than that overheard? I have no idea about the performance variations on these. – Divakar May 30 '14 at 17:11
  • @Divakar Summing along cols is faster that summing along rows or higher dims. I'm pretty sure about that. And the reason is Matlab stores arrays in column-major order. But the savings should balanced against the time needed to do the transpose. – Luis Mendo May 30 '14 at 17:17
2

If you don't care about the order of the rows, try this -

[C,~,ic] = unique(A,'rows','legacy')
B = C(histc(ic,unique(ic))==1,:)
Divakar
  • 218,885
  • 19
  • 262
  • 358