Trying to figure out a way to calculate the minimum percentage match when comparing a string to a column.
Example:
Column A Column B
Key Keylime
Key Chain Status
Serious
Extreme
Key
Where
Column A Column B Column C Column D
Key Temp 100% Key
Key Chain Status 66.7% Key Ch
Ten Key Ch 100% Tenure
Extreme
Key
Tenure
To expand on this:
- Column A is the column with strings to individually match
- Column B is the reference column
- Column C provides the highest percent match the column A string has with any string in column B.
- Column D provides the word from column B associated with the highest percent match
To expand on Column C - when looking at Key Chain
- the highest match to any word it has in column B is for Key Ch
where 6 out of the 9 characters (including space) of Key Chain
match to give a percentage match of (6/9) = 66.7%
- That being said, this isn't a deal breaker but it is something that sticks out. The logic above fails when there's no way to penalize for matches where you see an example like
Ten
occur. WhereTen
has 3 out of 3 characters that match againstTenure
giving it an inflated 100% match that I still can't think of a way to correct against.