I use utl_match.jaro_winkler in order to compare company names. In most cases it works fine, but sometimes I get pretty weird results.
This for example returns 0.62:
utl_match.jaro_winkler('ГОРОДСКАЯ КЛИНИЧЕСКАЯ БОЛЬНИЦА 18','ДИНА');
Those are absolutely different names both by length and symbols! How could it be 62%?
Another example:
SELECT utl_match.jaro_winkler('ООО МЕГИ', 'МЕГИ')
This returns 0! Despite the fact that those are very similar strings.
It feels like I should use something more complicated and advanced than just upper()
and utl_match.jaro_winkler()
. But I have no idea what exactly.
What would you recommend? What are best practices of comparing two strings? Where I can read about it?