Split up a matrix based on numbers in two columns

Question

I have a large matrix in which I want to separate the rows based on the values of two columns. For example:

M=[ 1 3 5 6; 3 6 5 4; 1 8 5 1; 4 6 5 7; 3 6 4 5; 3 6 4 4]

I want the rows to be separated by the common values seen in columns 1 and 3 simultaneously. This is the expected output:

A= [1 3 5 6; 1 8 5 1]
B= [3 6 5 4]
C= [4 6 5 7]
D= [3 6 4 5; 3 6 4 4]

I've tried the following. However, it only separates it by a single column:

A = arrayfun(@(x) M(M(:,1) == x, :), unique(M(:,1)), 'uniformoutput', false)

rayryeng · Accepted Answer · 2015-02-12T05:03:20.510

It's my understanding that you want to extract out rows of a matrix where they all share common values along the first and third columns. You can certainly use the arrayfun approach, but you'll need to modify how you call unique. Specifically, use the 'rows' flag on the columns that you want to examine, then inside your arrayfun call, use bsxfun combined with allto check for those rows that contain each unique combination of column elements you want, so:

>> M=[ 1 3 5 6; 3 6 5 4; 1 8 5 1; 4 6 5 7; 3 6 4 5; 3 6 4 4];
>> r = unique(M(:,[1 3]), 'rows', 'stable');
>> A = arrayfun(@(x) M(all(bsxfun(@eq, M(:,[1 3]), r(x,:)),2), :), 1:size(r,1),'uni', 0);
>> celldisp(A);

A{1} =

     1     3     5     6
     1     8     5     1

A{2} =

     3     6     5     4

A{3} =

     4     6     5     7

A{4} =

     3     6     4     5
     3     6     4     4

It's important to note that I use the 'stable' flag, because unique by default sorts the unique entries. 'stable' ensures that we find unique entries in the order in which we encounter them, like the desired output that is seen in your question. BTW, 'uni' is short for 'UniformOutput'. It'll save you typing in the long run :).

The third line is quite a mouthful, but very easy to explain. First, take a look at this statement:

bsxfun(@eq, M(:,[1 3]), r(x,:))

What we will do here is we take each row of r, which is a unique pairing of values taken from columns 1 and 3 of M, and see whether each value in a row of M is equal to the corresponding location in a row from r. In order to find a match, we need to make sure that all values from a row of this result are equal to 1, which is why we need to use all here and look across the columns:

all(bsxfun(@eq, M(:,[1 3]), r(x,:)),2)

Once we find these rows that match with a particular row in r, we use these and index into our matrix M and extract the rows that satisfy the row in r we are looking for in M and therefore:

M(all(bsxfun(@eq, M(:,[1 3]), r(x,:)),2), :)

x iterates from 1 up to as many rows as there are in r, and at each iteration, we extract out a unique row from r each time. The end result will be a cell array that groups rows of M based on common elements between the first and third columns.

If you want this to be more efficient, you can do what Divakar suggested and use only the third output of unique. This'll make things easier to read too:

>> M=[ 1 3 5 6; 3 6 5 4; 1 8 5 1; 4 6 5 7; 3 6 4 5; 3 6 4 4];
>> [~,~,r] = unique(M(:,[1 3]), 'rows', 'stable');
>> A = arrayfun(@(x) M(r == x, :), 1:max(r),'uni', 0);
>> celldisp(A);

A{1} =

     1     3     5     6
     1     8     5     1

A{2} =

     3     6     5     4         

A{3} =

     4     6     5     7

A{4} =

     3     6     4     5
     3     6     4     4

Certainly more readable! What's happening now is that the third output of unique assigns a unique ID from 1 up to as many unique values that you have that serves as the input into unique. Then, we simply find the maximum ID, and iterate from 1 up to the maximum ID, where at each iteration, this index serves as a way of extracting out the rows that correspond to each unique combination of the first and third columns from the matrix M.

Except you can use the third output from `unique` and hence avoid `bsxfun`. — Divakar, Feb 12 '15 at 04:56

score 1 · Answer 2 · edited May 23 '17 at 11:56

For me partitioning means accumarray in MATLAB. The first step is basically the same as in rayryeng's solution: Use unique to get the new positions of each row. The second step is: Based on those new positions: fetch the rows of M and put them in a cell.

[~,~,I] = unique(M(:,[1 3]), 'rows', 'stable');
A = accumarrayStable(I, 1:length(I), [], @(J) {M(J,:)});

As these row-positions won't be sorted we need a stable version of accumarray, which I take from this answer. If you don't care about the order of the rows in each A{i}, you won't need this, but can go with the faster accumarray right away.

function A = accumarrayStable(subs, val, varargin)
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1));
A = accumarray(subs, val(I), varargin{:});

Overall going with this solution should give you a 5-10x speedup with respect to the arrayfun-solution, depending on your matrix size of course.

Seems really fast indeed! Good job. – Divakar Feb 12 '15 at 12:31 — Divakar, Feb 12 '15 at 12:31

Split up a matrix based on numbers in two columns

2 Answers2