1

I want to store thousands of ~100 element vectors in a database, and then I need to search for the record with the smallest difference.

e.g. when comparing [4,9,3] and [5,7,2], take the element-wise diff: [-1,2,1] and then compute the Euclidean length: sqrt(1+4+1) = 2.45.

I need to be able to search for the record containing this lowest value.

I don't think I can do efficiently in MySQL. I hear Solr or Elastisearch might provide a solution; can someone point me towards or post an example of how this kind of search can be done (efficiently)?

mpen
  • 272,448
  • 266
  • 850
  • 1,236

1 Answers1

2

I think the answer to your question is here

But this is also quite interesting link

Unfortunately, In general you have to compare input vector to all other in database. Maybe, if you know something more about your data, you can separate your data to smaller subset of vectors and decrease the complexity of comparisons.

In PostgreSQL database, you can use C++ extensibility to write your own function like here or use K-Nearest-Neighbor Indexing. When the GPU is available, you can look on this GPU-based PostgreSQL Extensions for Scalable High-throughput Pattern Matching for extensions of the PostgreSQL.

Community
  • 1
  • 1
Krivers
  • 1,986
  • 2
  • 22
  • 45