0

I have a bunch (more or less 3500) of vectors with 4096 components and I need a fast method to see, given an input of another vector with the same length, which are the nearest N. I would like to use some matlab functions to do that. Is this ok for what I need?

https://uk.mathworks.com/help/stats/classificationknn-class.html

Ander Biguri
  • 35,140
  • 11
  • 74
  • 120
D.Giunchi
  • 1,900
  • 3
  • 19
  • 23
  • 1
    Hint: If your question can be answered with "Yes", then it is very likely not fit for stackverflow. Note that probably the answer to this one is "No", but still. – Ander Biguri Jun 13 '17 at 08:38
  • The [K-Nearest Neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) algorithm groups the vectors into `K` clusters. Each cluster center is the average (center of gravity) of its cluster members. To find the nearest vector, you would first identify the nearest cluster center and then search the cluster to find the nearest vector within the cluster. – Axel Kemper Jun 13 '17 at 08:48

1 Answers1

1

What you are suggesting is a clustering function, which should make N clusters out of all your vectors. Not sure this is what you want. If you simply want N minimum distances between the bunch of vectors, you can do it manually easy enough. Something like:

distances = matrixOfvectors - yourVector; % repmat(your...) if you have older Matlab.
[val, pos] = sort(sum(distances.^2, 2)); % Sum might need 1 instead of 2, depends whether vectors are rows or columns.
minVectors = pos(1:N); % Take indices of N nearest to get which vectors are the closest.

If N is small, say 3 or less, it would be slightly faster to avoid sort and just simply compare each new vector with 2nd biggest first, then with 1st or 3rd depending on the outcome.

Zizy Archer
  • 1,392
  • 7
  • 11
  • Unfortunately this method should be time consuming since my N is 4096 and the number of comparison is ~3500. So I need something faster. – D.Giunchi Jun 13 '17 at 09:17
  • I was wrong, apparently in matlab a script like this: tic; sA = size(A); B = rand(1, 4096); C = repmat(B, sA(1),1); disAC = A - C; [val, pos] = sort(sum(disAC.^2, 2)); minVectors = pos(1:10); toc takes 0.1 secs more or less that is ok for my purpose. Thank you very much! – D.Giunchi Jun 13 '17 at 09:57