2

I'm trying to make a search algorithm which finds the unique columns of a cell based on a tolerance level. The unique function of MATLAB (R2012a), does not provide a tolerance input. Below is the code which I have so far; I have limited myself to checking uniqueness based on the first identity (j=1) for now, however, this needs to be updated later.

The output is: I obtain a store cell which contains all the vector expect the duplicates of [0;1;0]. However other duplicate are maintained (e.g. [1;0;-0.4])

clear all; close all; clc;
%%
tolerance=1e-6;
U_vector{1} = [0  1      0 1       1      0 1      0 1          1;
               1  0      1 0       0      1 0      1 0          0;
               0 -0.4238 0 0.4238 -0.4238 0 0.4238 0 0.8161001 -0.8161];

for i = 1:1:size(U_vector,2)
    k=1;
    store{i}(:,k) = U_vector{i}(:,k);
    for j=1;%:1:(size(U_vector{i},2))
        for m=j:1:(size(U_vector{i},2))
            if (abs(U_vector{i}(:,j)-U_vector{i}(:,m)) >= tolerance)
                k=k+1;
                store{i}(:,k) = U_vector{i}(:,m);
            end
        end
    end
end
chappjc
  • 30,359
  • 6
  • 75
  • 132
user5489
  • 123
  • 2
  • 8
  • This question is a bit different because you are asking about unique rows, so I've answered here too, but you may also be interested in the answers to [this other question](http://stackoverflow.com/questions/1988535/return-unique-element-with-a-tolerance/20850949#20850949). – chappjc Dec 11 '14 at 18:08
  • Thanks for the link to the other question, I did find that when looking for a possible solution. – user5489 Dec 11 '14 at 18:54

3 Answers3

3

There's an undocumented function to merge similar points, which works on rows too:

>> u = [0  1      0 1       1      0 1      0 1          1;
               1  0      1 0       0      1 0      1 0          0;
               0 -0.4238 0 0.4238 -0.4238 0 0.4238 0 0.8161001 -0.8161];
>> uMerged = builtin('_mergesimpts',u.',0.3).'
uMerged =
         0    1.0000    1.0000    1.0000    1.0000
    1.0000         0         0         0         0
         0   -0.8161   -0.4238    0.4238    0.8161

Just get u = U_vector{1}; in your case, then pack the result in a cell too (out{1} = uMerged;).

Also, the function can take a vector tolerance indicating a tolerance for each column. From the command line message from this function:

Tolerance must be a scalar or a vector with the same number of columns as the first input 'X'.

So this works too:

uMerged = builtin('_mergesimpts',u.',[eps eps 0.3]).'

BTW: There will probably be an official function for this in the future, but we're not allowed to discuss :).

chappjc
  • 30,359
  • 6
  • 75
  • 132
  • And the output is also sorted (like `unique`). Nice! – Hoki Dec 11 '14 at 18:12
  • This helped a lot! But, how does this actually take the tolerance into account? Too bad it is an undocumented function :p. – user5489 Dec 11 '14 at 19:07
  • @user5489 I'm pretty sure it's the based on the absolute difference of corresponding vector components (considering columns separately), not a scaled tolerance (but maybe) or a vector distance. – chappjc Dec 11 '14 at 19:12
  • @user5489 I just remember you can also specify different tolerances for each column (updated answer). – chappjc Dec 11 '14 at 19:21
  • How do you get all that knowledge on undocumented functions?! +1 – Luis Mendo Dec 11 '14 at 22:42
  • 1
    @LuisMendo Browsing through MATLAB source, of course! ;) And [undocumentedmatlab.com](http://undocumentedmatlab.com/). Yair Altman is a great resource. – chappjc Dec 11 '14 at 23:18
1

You do not need so many nested loops. This works with the sample you provided.

It uses a working table which is reduced as duplicates are found.

for ii = 1:1:size(U_vector,2)
    A = U_vector{ii} ;          %// create a working copy of the current table
    store{ii} = [] ;            %// initialize the result cell array
    endOfTable = false ;
    while ~endOfTable
        store{ii}(:,end+1) = A(:,1) ;                   %// save the first column of the table
        idx = logical( sum( abs( bsxfun(@minus,A(:,2:end),A(:,1))) >= tolerance ) ) ;   %// find the indices of the columns not within the tolerance
        A = A(:, [false idx] ) ;                        %// remove the duplicate columns in A
        if size(A,2) < 2 ; endOfTable = true ; end      %// exit loop if we reached the last column
    end
    %// store last column if it remained unmatched
    if size(A,2) == 1
        store{ii}(:,end+1) = A(:,1) ;
    end
end

Which output with your data:

>> store{1}
ans =
         0    1.0000    1.0000    1.0000    1.0000
    1.0000         0         0         0         0
         0   -0.4238    0.4238    0.8161   -0.8161
Hoki
  • 11,637
  • 1
  • 24
  • 43
0

What about this?!:

u = cell2mat(U_vector{1});
i=1;
while i<=size(u,2)
  test=repmat(u(:,i),1,size(u,2)); % compare matrix entries to current column i
  differentCols = ~all(same);      % column indices that are not equal to column i 
  differentCols(i)=1;              % ensure one such column stays in u
  u=u(:,differentCols);            % new u-> keep different columns
  i=i+1;                           % next column
end
u                                  % print u

Seems to work for me, but no guarantees.

Thomas
  • 725
  • 4
  • 14