We're trying to choose a string metric algorithm for our string comparison program. Which would be the best string metric algorithm if we want to detect misspellings and alteration of the word like changing letters to words or symbols, adding extra letters, or reversing the words, etc? Sorry for terrible English
Asked
Active
Viewed 76 times
0
-
Have a look at https://en.wikipedia.org/wiki/Edit_distance . The cost-function should depend on which kind of errors are how likely in your data. So 'best' always depends on the domain you use it for. – MrSmith42 Nov 24 '21 at 15:21
-
Can you elaborate more? I'm kind of confused about this topic to be honest... – Quote Jan 05 '22 at 14:52
-
One of the implest Edit Distances is the Levenshtein distance (you can find plenty of examples, documentation and implementations about it on the www). This might be a good start. – MrSmith42 Jan 11 '22 at 08:10
-
Our professor told us to prove why the chosen algorithm is relevant for this type of problem. And we're currently stuck at this question as there's many other string metric algorithms but we can't find any significant papers to prove it. Any advice would be appreciated. – Quote Jan 11 '22 at 12:13