Issue: I need to highlight matched terms. Out-of-the-box solution cannot be applied due to the fact we don't keep sources inside ES.
Possible solution:
- Retrieve ids from ES by search query
- Retrieve sources by ids
- Match source with query word by word using LevinsteinDistance algorithm or lucene FSM class
Considering we don't retrieve a lot of content at a time it should not consume a lot of time.
The question is the following:
Does Lucene library contain FSM/automata to represent a dictionary? The desired solution: to get lucene automata representing the dictionary and feed the query to it term by term. Automata should accept terms that are contained in the dictionary. Edit Distance should be considered as well.
Searching for the solution I found lucene classes like LevenshteinAutomata and FuzzyQuery. But LevenshteinAutomata (as I understood) represents only one term. So for several terms I need several automata.