Questions tagged [approximate-nn-searching]

Approximate Nearest neighbor search (ANNS), also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces.

The interest for ANN search is that exact NNS can become very costly in terms of time of execution, when the dataset is big (number of points and especially dimension (curse of dimensionality)).

As a result, we accept the trade-off between speed and accuracy. The more speed, the less accuracy. In other words, we may find the exact NN, by performing a relatively fast search.

For example, in the picture below:

enter image description here

the point marked with an asterisk is the exact NN, but the ANN search may return as the point with the number 2, which is not the actual closest point to the query, but an approximation.

For more, visit wikipedia-ANN.

44 questions
17
votes
5 answers

How to find the closest 2 points in a 100 dimensional space with 500,000 points?

I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it? Update: Space is Euclidean, Sorry. And thanks for all the answers. BTW this is not homework.
4
votes
0 answers

Nearest neighbor search over Levenshtein distance in Python using metric indexing

I would like to perform nearest-neighbor search over the Levenshtein distance with a dataset of texts in Python. The search can be approximate, should be implemented using core Python libraries, and queries should be near-constant-time operations.…
4
votes
2 answers

Best data structure for high dimensional nearest neighbor search

I'm actually working on high dimensional data (~50.000-100.000 features) and nearest neighbors search must be performed on it. I know that KD-Trees has poor performance as dimensions grows, and also I've read that in general, all space-partitioning…
3
votes
1 answer

Using an "approximate" STL map

I would like to create an STL map to find whether an item is close enough to another item in 3 dimensional space. So far, my "less-than-functor" has worked quite well, pasted to the following link. Now this problem isn't quite the "nearest neighbor"…
macetw
  • 1,640
  • 1
  • 17
  • 26
3
votes
1 answer

Reinforcement Learning in arbitrarily large action/state spaces

I’m interested to use Deep Reinforcement Learning in order to find an - unique - optimal path back home among (too many) possibilities and a few (required) intermediate stopes (for instance, buy a coffee or refuel). Furthermore, I want to apply this…
3
votes
1 answer

Approximate string matching in Ruby

I am implementing user search functionality in my Rails application. I want the application to suggest the correct spellings if the user makes a mistake in typing the spelling. Is there any plugin for this in ruby. Can this be done in…
Pankaj
  • 2,538
  • 3
  • 26
  • 39
3
votes
1 answer

Performance of Annoy method Vs. KD-Tree

I am doing a research on approximate nearest neighbor algorithms. I recently found the Annoy Library which does an amazing job in finding KNN in reasonable speed. For deeper analysis, you can skim the meetup slides. After reading the slides and…
2
votes
1 answer

Embedding version control of the weaviate

In Weaviate, the vector engine, I wonder how this can handle version issue of embedding model. For instance, considering the (trained) word2vec model, embedded vectors from different models must be seperated. One option might think is that make…
2
votes
1 answer

How filtered document list look up work in nearest neighbour search prefiltering

In pre-filter based ANN, once we have list of documents after applying pre-filter, vespa starts hsnw algorithm to find nearest neighbours. In hsnw algorithm, vespa starts with a node and look for the neighbours which are present in pre-filter list,…
tourism
  • 155
  • 12
2
votes
1 answer

How does FLANN select what algorithm and parameters to use?

FLANN (Fast Library for Approximate Nearest Neighbors) is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. It contains a collection of algorithms we found to work best for nearest neighbor search and a…
harkib
  • 43
  • 1
  • 5
2
votes
2 answers

Populating an array from a .txt file

I'm trying to populate an array from a .txt that I am reading. I am using this code that I am using as a function to read the file: double* read_text(const char *fileName, int sizeR, int sizeC) { double* data = new double[sizeR*sizeC]; int i…
Hparsons28
  • 87
  • 1
  • 2
  • 8
2
votes
1 answer

On hash functions application in cryptography vs feature

I have only started learning feature hashing so I need help in understanding if I can apply the hash function expressed mathematically as https://en.wikipedia.org/wiki/Tent_map. and one such application of Tent map is in cryptography -- Paper 1:…
2
votes
1 answer

Utilizing FLANN library for Multi-Label Classification

I want to utilize the FLANN library for Mutli-Label Classification. I know that FLANN library is for computation of nearest neighbors, but I'm not sure how to utilize it for classification purposes. Is there some way to plug it in the Scikit-Learn…
2
votes
1 answer

Search in locality sensitive hashing

I'm trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper: Given bit vectors consisting of d bits each, we choose N = O(n 1/(1+epsilon) ) random permutations of the…
2
votes
1 answer

What is the ε (epsilon) parameter in Locality Sensitive Hashing (LSH)?

I've read the original paper about Locality Sensitive Hashing. The complexity is in function of the parameter ε, but I don't understand what it is. Can you explain its meaning please?
1
2 3