3

Given two vectors containing numerical values, say for example

a=1.:0.1:2.;
b=a+0.1;

I would like to select only the differing values. For this Matlab provides the function setdiff. In the above example it is obvious that setdiff(a,b) should return 1. and setdiff(b,a) gives 2.1. However, due to computational precision (see the questions here or here) the result differs. I get

>> setdiff(a,b)
   ans = 
      1.0000 1.2000 1.4000 1.7000 1.9000

Matlab provides a function which returns a lower limit to this precision error, eps. This allows us to estimate a tolerance like tol = 100*eps;

My question now, is there an intelligent and efficient way to select only those values whose difference is below tol? Or said differently: How do I write my own version of setdiff, returning both values and indexes, which includes a tolerance limit?

I don't like the way it is answered in this question, since matlab already provides part of the required functionality.

Community
  • 1
  • 1
Jost
  • 410
  • 7
  • 19
  • 1
    I am not sure if this elegant or "approved" by any source, but you can use a tolerance, `tol` of a very small number or which could be a multiple of `eps` and get the exclusive values using `bsxfun` like so - `setdiff_BA = b(~any(abs(bsxfun(@minus,b,a')) – Divakar Jul 31 '14 at 13:36
  • @Schorsch You are right on this. We definitely need a value `tol > eps`. – Jost Jul 31 '14 at 13:43
  • I think I found a sweeter solution to this. Since MATLAB performs subtractions with `setdiff`, you can use `eps` directly here as the `tolerance` with `<=`, like so - `setdiff_BA = b(~any(abs(bsxfun(@minus,b,a'))<=eps,1))`. Thus, you don't have to worry about `tol`. If MATLAB were doing multiplication or division, the tolerance would not be `eps` and would change accordingly. It's sort of a logical estimate without any reference though. – Divakar Jul 31 '14 at 13:50
  • @Jost You could just round both of your matrices off to `tol` (i.e. `a_rounded = round(a/tol)*tol`) and then call `setdiff` on the rounded matrices – Dan Jul 31 '14 at 13:55
  • @Divakar I have updated the question. Also, I like your solution. Please formulate an answer reflecting the small changes. Remember that the answer should be valid for matrices as well. – Jost Jul 31 '14 at 13:55
  • @Jost Your question is specific to vectors because `setdiff` works on vectors only. If you want to extend the solution to 2D matrices too, then maybe you could suggest some function that operates on those? – Divakar Jul 31 '14 at 13:58
  • @Dan Good idea, but the `a_rounded` is again limited by numerical precision, so this does not work as expected. – Jost Jul 31 '14 at 14:07
  • @Jost but they will be limited in the same way so the precision shouldn't be a problem. Have you tested it on your numbers? – Dan Jul 31 '14 at 14:08
  • @Dan, yes for my numbers it's not working, unfortunately. – Jost Jul 31 '14 at 14:13
  • 1
    @Divakar Your method works for me. And you are right, I was fooled by the description in the matlab documentation, there is no "real" matrix comparison here. I'll replace the "matrixes" by "vectors" in my question. – Jost Jul 31 '14 at 14:15
  • @Jost Posting the solution in a bit with some comments for the code. – Divakar Jul 31 '14 at 14:17
  • @Jost it does work on your example numbers though... so how do your actual numbers differ to stop the rounding from working? – Dan Jul 31 '14 at 14:23

2 Answers2

2

Introduction and custom function

In a general case with floating point precision issues, one would be advised to use a tolerance value for comparisons against suspected zero values and that tolerance must be a very small value. A little robust method would use a tolerance that uses eps in it. Now, since MATLAB basically performs subtractions with setdiff, you can use eps directly here by comparing for lesser than or equal to it to find zeros.

This forms the basis of a modified setdiff for floating point numbers shown here -

function [C,IA] = setdiff_fp(A,B)
%//SETDIFF_FP Set difference for floating point numbers.
%//   C = SETDIFF_FP(A,B) for vectors A and B, returns the values in A that 
%//   are not in B with no repetitions. C will be sorted.
%//
%//   [C,IA] = SETDIFF_FP(A,B) also returns an index vector IA such that
%//   C = A(IA). If there are repeated values in A that are not in B, then
%//   the index of the first occurrence of each repeated value is returned.

%// Get 2D matrix of absolute difference between each element of A against 
%// each element of B
abs_diff_mat = abs(bsxfun(@minus,A,B.')); %//'

%// Compare each element against eps to "negate" the floating point
%// precision issues. Thus, we have a binary array of true comparisons.
abs_diff_mat_epscmp = abs_diff_mat<=eps;

%// Find indices of A that are exclusive to it
A_ind = ~any(abs_diff_mat_epscmp,1);

%// Get unique(to account for no repetitions and being sorted) exclusive 
%// A elements for the final output alongwith the indices
[C,IA] = intersect(A,unique(A(A_ind)));

return;

Example runs

Case1 (With integers)

This will verify that setdiff_fp works with integer arrays just the way setdiff does.

A = [2 5];
B = [9 8 8 1 2 1 1 5];
[C_setdiff,IA_setdiff] = setdiff(B,A)
[C_setdiff_fp,IA_setdiff_fp] = setdiff_fp(B,A)

Output

A =
     2     5
B =
     9     8     8     1     2     1     1     5
C_setdiff =
     1     8     9
IA_setdiff =
     4
     2
     1
C_setdiff_fp =
     1     8     9
IA_setdiff_fp =
     4
     2
     1

Case2 (With floating point numbers)

This is to show that setdiff_fp produces the correct results, while setdiff doesn't. Additionally, this will also test out the output indices.

A=1.:0.1:1.5
B=[A+0.1 5.5 5.5 2.6]
[C_setdiff,IA_setdiff] = setdiff(B,A)
[C_setdiff_fp,IA_setdiff_fp] = setdiff_fp(B,A)

Output

A =
    1.0000    1.1000    1.2000    1.3000    1.4000    1.5000
B =
    1.1000    1.2000    1.3000    1.4000    1.5000    1.6000    5.5000    5.5000    2.6000
C_setdiff =
    1.2000    1.4000    1.6000    2.6000    5.5000
IA_setdiff =
     2
     4
     6
     9
     7
C_setdiff_fp =
    1.6000    2.6000    5.5000
IA_setdiff_fp =
     6
     9
     7
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    @Jost You should not make substantial code changes to someone else's answer. Either comment with the suggestion for the OP to make themselves, or post your own answer (citing the original). – nobody Jul 31 '14 at 14:50
  • @Jost I got your idea what you are trying to do. I need to make few more edits with it and will incorporate what you had in mind. – Divakar Jul 31 '14 at 14:52
  • 1
    @Divakar Be careful with the `'`! It gives the complex conjugate transpose, not the transposed matrix, as does `.'`. – Jost Jul 31 '14 at 14:52
  • @AndrewMedico Understood, thanks for the hint. But Divakar apparently got to see my changes, which made it a lot easier for him to understand what I meant. Is there an 'official' way of achieving this functionality? – Jost Jul 31 '14 at 14:55
  • @Jost I think what you did was okay. I was still reviewing it though. Main thing is the idea came through to me. I think it would be better to not get into `tol` inside the function, otherwise we would be needed to guide users on what must be `tol`, which circles back to the user being needed to handle FP issues. – Divakar Jul 31 '14 at 14:57
  • @Jost Check out the edits! Phew, that part of adding the indices was some work!! Hope this works out for you. – Divakar Jul 31 '14 at 17:27
  • @Jost I was curious if the indices finding part looked good too. If the solution has met the expectations of the question, consider accepting it, so that people with similar issues could find it easily, as accepted solutions stay at the top, maybe you know about that. – Divakar Aug 02 '14 at 05:26
0

For Tolerance of 1 epsilon This should work:

a=1.0:0.1:2.0;
b=a+0.1;
b=[b b-eps b+eps];
c=setdiff(a,b)

The idea is to expand b to include also its closest values.

Mendi Barel
  • 3,350
  • 1
  • 23
  • 24