11

I have a list of strings, some of which have been modified since my previous release. Some of the changes are trivial (spacing, off by one word, etc). I would like to detect strings that have only "minor" differences, so that I can try to use the older translations if at all possible.

What do I mean by "minor differences"? I will not know until I start working with the database.

DO you know of any tunable routines that will indicate when two strings are similar but not identical? Any routines that will return a number indicating how different two strings are?

jon bondy
  • 411
  • 3
  • 11
  • 2
    You're going to need a way to grade how similar strings are. There are a million ways to do it. Here's a thread, see the various answers: http://stackoverflow.com/questions/4323977/string-similarity-score-hash – Jonathan M May 01 '12 at 19:13
  • It sure would be cool if you found something new that wasn't in the links above or below. Please come back and tell us what you did. – Warren P May 03 '12 at 01:25

1 Answers1

9

There are many such algorithms. Keywords are fuzzy string matching.

A well known one is a Levenshtein distance. By it you can calculate the number of "changes" required to transform one string into another, so that gives you an estimate of how similar the strings are.

See also this question: How to search for similar words for solutions in Delphi.

Community
  • 1
  • 1
  • 6
    See also [how-do-you-implement-levenshtein-distance-in-delphi](http://stackoverflow.com/q/54797/576719). – LU RD May 01 '12 at 19:44