Given: A huge set of N-dimensional vectors - { V1, V2, V3, ..., Vm } Example of a vector:
[72, 100, 34, 45, 87, 123, 99, 32] // N = 8
Input: As an input, we are given another vector with the same dimension as the set described above. Let's call this vector X.
Objective: Find the most similar (or top K similar vectors, K is relatively small)from the given set for vector X provided. Similarity is defined as https://en.wikipedia.org/wiki/Euclidean_distance.
I am looking for an approach that can give me O(log M) complexity, where M is a number of vectors in the set.
Notes, that N could be relatively big (like 100, 500, 1000). M is huge (like multiple millions or billions).
I am looking into https://en.wikipedia.org/wiki/Locality-sensitive_hashing.