Does Vespa support comparators for string matching like Levenshtein, Jaro–Winkler, Soundex etc? Is there any way we can implement them as plugins as some are available in Elasticsearch? What are the approaches to do this type of searches?
1 Answers
The match modes supported by Vespa is documented here https://docs.vespa.ai/documentation/reference/schema-reference.html#match plus regular expression for attribute fields https://docs.vespa.ai/documentation/reference/query-language-reference.html#matches
None of the mentioned string matching/ranking algorithms are supported out of the box. Both edit distance variants sounds more like a text ranking feature which should be easy to implement. (Open a github issue at https://github.com/vespa-engine/vespa/issues)
The matching in Vespa happens in a c++ component so no plugin support there yet.
You can deploy a plugin in the container which is written in Java by deploying a custom searcher (https://docs.vespa.ai/documentation/searcher-development.html). Then you can work on the top k hits, using e.g regular expression or n-gram matching to retrieve candidate documents. The soundex algorithm can be implemented accurately using a searcher and a document processor.

- 2,984
- 5
- 8