0

How to improve the distance calculation on the 2 separated datasets?

This is the code:

X = [   3.6     79
        1.8     54
        3.333   74
        2.283   62
        4.533   85
        2.883   55
        4.7     88
        3.6     85
        1.95    51
        4.35    85
        1.833   54
        3.917   84
        4.2     78
        1.75    47
        4.7     83
        2.167   52
        1.75    62
        4.8     84
        1.6     52
        4.25    79
        1.8     51
        1.75    47
        3.45    78
        3.067   69
        4.533   74
        3.6     83
        1.967   55
        4.083   76
        3.85    78
        4.433   79
        4.3     73
        4.467   77
        3.367   66
        4.033   80
        3.833   74
        2.017   52
        1.867   48
        4.833   80
        1.833   59
        4.783   90  ]
    clc;  
    close all; 
    figure;
    h(1) = plot(X(:,1),X(:,2),'bx');
    hold on;
    X1 = X(1:3,:);
    X2 = X(4:40,:);
    h(2) = plot(X1(1:3,1), X1(1:3,2),'rs','MarkerSize',10);
    k=5;
    [D2 ind] = sort(squeeze(sqrt(sum(bsxfun(@minus,X2,permute(X1,[3 2 1])).^2,2))))
    ind_closest = ind(1:k,:)
    x_closest = X(ind_closest,:)
    for j = 1:length(x_closest);
        h(3) =plot(x_closest(j,1),x_closest(j,2),'ko','MarkerSize',10);
    end

The output is shown as in the picture below: enter image description here

The problem is, the code does not pick the closest data points of red squared data points. I also tried to use pdist2 function from statistical toolbox,the result yields similar with the bsxfun function that i applied in my code. I'm not sure which part in the code need to improve so that i can pick the data points that closest to the target. Really appreciate if anyone can help me to improve my code

amjay
  • 11
  • 6
  • We need your dataset, because your code, as is, is not reproductible. – Tommaso Belluzzo Feb 25 '18 at 00:02
  • @TommasoBelluzzo the csv file is available in here : https://forge.scilab.org/index.php/p/rdataset/source/tree/master/csv/datasets/faithful.csv – amjay Feb 25 '18 at 07:04
  • Possible duplicate of [Finding K-nearest neighbors and its implementation](https://stackoverflow.com/questions/27475978/finding-k-nearest-neighbors-and-its-implementation) – Hunter Jiang Feb 26 '18 at 01:02
  • @Hunter Jiang, thank you very much for the url. I have tried and it is work as expected. – amjay Feb 26 '18 at 12:30

1 Answers1

0
  • If the closest point means closest to X, line 19 & line 20 should be replaced as

    [D2 ind] = sort(squeeze(sqrt(sum(bsxfun(@minus,X,permute(X1,[3 2 1])).^2,2))))

    ind_closest = ind(2:k+1,:)

  • If the closest point means closest to X2, then try this:

    x_closest = X2(ind_closest,:)

In the meanwhile, I modified your code a little bit, since your h(3) could be optimized.

clc; clear; close all;         
%load fisheriris 
%X=meas(:,3:4);
load X
X=unique(X,'rows');

figure;
h(1) = plot(X(:,1),X(:,2),'bx');
hold on;

X1 = X([5 15 30],:);
h(2) = plot(X1(:,1), X1(:,2),'rs','MarkerSize',10);
[D2,ind] = sort(squeeze(sqrt(sum(bsxfun(@minus,X,permute(X1,[3 2 1])).^2,2))));

k=3;
ind_closest = unique(ind(2:k+1,:));
x_closest = X(ind_closest,:);
h(3) =plot(x_closest(:,1),x_closest(:,2),'ko','MarkerSize',10);
axis equal

It seems to be working fine.

enter image description here

Hunter Jiang
  • 1,300
  • 3
  • 14
  • 23
  • @amjay I've tried your data. Note that your `csv` file has 3 columns so use `X = faithfuldat(1:40,2:3);` instead. – Hunter Jiang Feb 25 '18 at 08:03
  • i just updated my recent data. I had tried to implement your suggestion, it is improve little bit, but does not pick the closest data points as shown in this page: https://www.codementor.io/tips/9712834528/finding-k-nearest-neighbour-with-matlab – amjay Feb 25 '18 at 12:45
  • It seems to work. What's wrong? I updated my post and please point it out. – Hunter Jiang Feb 26 '18 at 00:51
  • Adding an `axis equal` at the end, maybe you could find the answer is right. – Hunter Jiang Feb 26 '18 at 03:37