0

My data consist of 20 values: 13 doubles, 6 integer and one string, pr.row. The search is performed on 13 doubles. And I need to find the closest 10 rows to the input of 13. I need it to be fast (less or equal to 1ms). The data I store is around 200000 rows. Everything needs to be executed on one machine.

I can achieve a high speed(<1ms per query) using MySQL and exact search. When I try to look for the closest in MySQL it takes around 20ms per query, which for my application is too slow.

Is it possible to use elasticsearch in this context? Could it ever give me <=1ms?

Lulu
  • 3
  • 1
  • How do you define "closest" when you are comparing 13 different values across your 200,000 rows? (Smallest of: |delta1| + |delta2| + ... + |delta13| ?) – Peter Dixon-Moses Sep 23 '15 at 15:55
  • If you can **index** *the thing you are comparing* directly you will see very fast response time using either MySQL or Elasticsearch. Otherwise, if either tool needs to scan all rows to dynamically compute a comparison-value at query-time, performance will be proportional to the computation and collection-size. – Peter Dixon-Moses Sep 23 '15 at 16:01
  • Thank you for your answer. Yes, using Euclidean distance formula. I can index fields. This helps a lot but still it is not fast enough. Yes, I need to "loop" through all fields. Then it seems it cannot be faster. – Lulu Sep 24 '15 at 09:23

1 Answers1

0

Elasticsearch can't help you optimize your problem unless you can quickly identify a small subset of nearby records (e.g. using Locality Sensitive Hashing) to make your Euclidean Distance calculation cheaper.

Here is a good discussion on different kNN approximations: Nearest neighbors in high-dimensional data?.

And a whitepaper: http://web.mit.edu/andoni/www/papers/cSquared.pdf

Once you have your data hashed, and the hashcodes indexed, it probably doesn't matter whether it's in MySQL or Elasticsearch, since you'll be able to quickly narrow your initial selection to a small neighborhood of data points instead of computing Euclidean Distance over the entire set of 200,000.

Community
  • 1
  • 1
Peter Dixon-Moses
  • 3,169
  • 14
  • 18