I have a list of N
2-d vectors and want to find out which are the k
(=e.g.3) ones which appear the most often.
Vectors which difference (e.g. distance, or which would be the best "similarity measure"?) is less than a threshold th
should be counted as the same. All similar vectors can be aggregated by their mean.
So my desired output would be dictionary of k
vectors with their respective frequency f
.
Minimal Example:
k = 1
input = [[1.0,2.0],[1.1,2.1],[3.0,4.0]]
output = {[1.05,2.05]:2}
What would be the most efficient algorithm to calculate that (pseudocode or python would be nice).
Edit: Vectors that are identical but with opposing directions (e.g. (1,-1) and (-1,1) ) should be counted as same;