4

I'm looking for a tool that would compare two text strings and return a result being in fact the indicator of their similarity (e.g. 95%). It needs to be implemented on a platform supporting Java libraries.

My best guess is that I need some fuzzy logic comparison tool that would do the fuzzy match and then return the similarity level.

I've seen some posts here related to fuzzy search but I need the exact opposite - meaning I don't want to set some parameters and have similar entries returned. Instead I have the entries on hand but need to have those similarity parameter derived from them...

Can you advise me on that? Many thanks

mikolajek
  • 91
  • 2
  • 9

2 Answers2

2

Apache's StringUtils has something called Levenshtein distance indicator. http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/StringUtils.html

Levenshstein distance is an algorithm that outputs the similarity based on "edit distance". Although I'm not sure if this is "fuzzy".

Example: int distance = StringUtils.getLevenshteinDistance("cat", "hat");

mrQWERTY
  • 4,039
  • 13
  • 43
  • 91
  • 1
    Is deprecated, levensthein distance is now in the apache.commons.text library, e.g. int dist = new LevenshteinDistance().apply("cat", "hat") – Sanoj Aug 25 '22 at 12:12
2

There is now a library that does exactly that https://github.com/intuit/fuzzy-matcher

mob
  • 567
  • 5
  • 12