0

I have a vector e.g. A=[2.30 2.32 2.67 2.44 2.31 1.23] I am interested to find all closest (almost equal) values with in this vector. The answer from above example should be index 1,2 and 5.

I don't know how to prescribe the tolerance, but the resulting values should be almost equal to each other. can any body provide a hint?

erbal
  • 421
  • 5
  • 18

3 Answers3

1

I suggest the following approach:

%initialize A 
A=[2.30 2.32 2.67 2.44 2.31 1.23];

%initilize an epsilon parameter which defines how close 2 values should be to one another to considered identical.
EPSILON = 0.05; 

%generates all possible lists of pairs coordinates from A
[p,q] = meshgrid(1:n);
mask = logical(tril(ones(n,n))-eye(n,n));
allPairs = [p(mask),q(mask)];

%find pairs with absolute difference below epsilon
validPairs = abs(A(allPairs(:,1))- A(allPairs(:,2))) < EPSILON;

%result - pairs of numbers which are close to one another
allPairs(validPairs,:)

Result:

ans =

 1     2
 1     5
 2     5

*The code for generating all possible pairs is taken from @Lambdageek solution

Community
  • 1
  • 1
ibezito
  • 5,782
  • 2
  • 22
  • 46
  • I could not guess the tolerance (EPSILON) in advance? May be different for another example. – erbal Jul 01 '16 at 18:47
0

If you want to express distance in mathematical terms you can use the Euclidean Distance. Here is the expression:

enter image description here

If you have a higher dimensional space (which you have) you can get some information from Wikipedia. But it's still straight forward:

https://en.wikipedia.org/wiki/Euclidean_distance#n_dimensions

Since the Euclidean Distance is not the best distance measure in higher dimensional spaces, some people suggest the Cosine Similarity:

https://en.wikipedia.org/wiki/Cosine_similarity

You could also use an algorithm such as k-means or k-nearest-neighbors to solve this task.

If you are just looking for the most similar values in it:

  • Define a threshold. Let's say 0.01

  • Select the first element of the vector (xi, where i=0)

  • Select the first element which is not xi (xj, where j=i+1)

  • Compare xi with xj by, for example, dist = sqrt((xi - xj)^2). If dist is smaller or equal to your threshold, xi and xj are very
    similar.

  • Increment xj and compare again

  • If xj is at the end of your vector, increment xi

  • Do this until you compared all elements.
Bastian
  • 1,553
  • 13
  • 33
0

This approach does not need any defined absolute tolerance, instead a tolerance relative to smallest difference is needed. It always looks for the most close group in the data. In this form it will not work if you have exact duplicate values in your data, but you can easily extend it to handle that case nicely as well.

A=[2.30 2.32 2.67 2.44 2.31 1.23];
diffFactor=3;

Asorted=sort(A);
Adiff=abs(Asorted(1:end-1)-Asorted(2:end));
[minDiff,minInd]=min(Adiff);

commonValue=Asorted(minInd);

resultIndex=find(A>=commonValue-diffFactor*minDiff & A<=commonValue+diffFactor*minDiff)
Juha Lipponen
  • 188
  • 1
  • 8
  • If two values happens to be same in dataset, then this code will ignore all closer higher/lower values. And how to decide diffFactor? – erbal Jul 01 '16 at 18:44