0

Issue: I need to highlight matched terms. Out-of-the-box solution cannot be applied due to the fact we don't keep sources inside ES.

Possible solution:

  1. Retrieve ids from ES by search query
  2. Retrieve sources by ids
  3. Match source with query word by word using LevinsteinDistance algorithm or lucene FSM class

Considering we don't retrieve a lot of content at a time it should not consume a lot of time.

The question is the following:

Does Lucene library contain FSM/automata to represent a dictionary? The desired solution: to get lucene automata representing the dictionary and feed the query to it term by term. Automata should accept terms that are contained in the dictionary. Edit Distance should be considered as well.

Searching for the solution I found lucene classes like LevenshteinAutomata and FuzzyQuery. But LevenshteinAutomata (as I understood) represents only one term. So for several terms I need several automata.

Alkis Kalogeris
  • 17,044
  • 15
  • 59
  • 113
katrin
  • 1,146
  • 1
  • 13
  • 24
  • so you want the autocomplete functionality correct? can you have a look ar https://opster.com/elasticsearch-glossary/elasticsearch-autocomplete-troubleshooting-guide/ and https://stackoverflow.com/questions/60584099/autocomplete-with-java-redis-elastic-search-mongo/60584211#60584211 and tell me which one suits your requirements, accordingly I can help – Amit Mar 30 '20 at 09:20
  • Thank you for response! Actually I try to find the better solution for the highlighting issue. We don't keep the sources in ElasticSearch so highlighting doesn't work out-of-the-box. There are several ways how to solve this issue and one way is to check how the query terms matches the content. So if could represent the content as automata and feed the query - I'll get the terms to be highlighted or vise versa - represent the query and feed the content. Considering we don't retrieve a lot of content at a time it should not consume a lot of time. – katrin Mar 30 '20 at 09:39
  • Got it, can you please update the question with all these details, I would not be able to help much here, but somebody from the community should be able to help – Amit Mar 30 '20 at 09:48

0 Answers0