3

I have a row of 10,000 names in two corresponding columns, 10,000 in each. Each cell in Column A corresponds to the adjacent cell in Column B. I want to do a fuzzy match and get a compatibility score on all of them just with the adjacent cell. I do not want it to search entire column versus entire column, just adjacent cells, which I don't seem to be able to do with the Fuzzy Match Excel add in, ideas?

Example:

Column A:       Column B:        Value:
Apple           Aplle            80%
Banana          Banana           100%
Orange          Ornge            85%   
pnuts
  • 58,317
  • 11
  • 87
  • 139
Seamus
  • 39
  • 3
  • How would you like the values in the third column to be calculated?? – Gary's Student Nov 20 '15 at 13:25
  • Hi, I just want a confidence score as to how similar the names are? Some may be an exact match others may be completely different. Exactly like the Fuzzy Match tool but I don't think I can set that to just do adjacent cells, it only calculates column versus column. Thanks! – Seamus Nov 20 '15 at 13:35
  • This would be pretty convoluted with formulas; a non-vba solution would probably need a number of helper columns equal to the max number of letters being compared. I doubt that's what you're looking for. The problem is that you want Apple to show as only 1 letter off when the 'p' is changed to 'l', but you want Orange to show as only 1 letter off when a letter is missing [I believe these is simply Levenshtein distance]. It would be difficult to show that Orange is not 4 letters different using Excel formulas, if one simply compares them letter by letter. – Grade 'Eh' Bacon Nov 20 '15 at 14:43

2 Answers2

2

Well, I don't know about Fuzzy Match Addin but you can accomplish similar to your requirement using UDF.

Something like this based on your sample data

Function FuzzyComparision(String1 As String, String2 As String) As Double
Dim intStringLength As Integer
Dim dblScore As Double
Dim dblUnitScore As Double
Dim intCounter As Integer

intStringLength = WorksheetFunction.Max(Len(String1), Len(String2))
dblUnitScore = 1 / intStringLength
dblScore = 0#

For intCounter = 1 To intStringLength
    If Mid(UCase(String1), intCounter, 1) = Mid(UCase(String2), intCounter, 1) Then
        dblScore = dblScore + dblUnitScore
    Else
        If Len(String1) <> Len(String2) And intCounter < intStringLength Then
            If Mid(UCase(String1), intCounter + 1, 1) = Mid(UCase(String2), intCounter, 1) Then
                dblScore = dblScore + dblUnitScore
            End If
        End If
    End If
Next

FuzzyComparision = dblScore
End Function

Of course the comparison can be further improvised as required. Let me know if that helps

Edit: as Grade Bacon pointed (thanks for that observation), this does not work as expected for missing letters. I have tweaked the function a bit to do a case insensitive comparison and compensate for missing letters as in example.

Even though it works, utility depends on acceptable score deviation +/-. OP may shed some light.

PankajR
  • 407
  • 3
  • 9
  • Works nice. +1 for quality and effort. – SQL Police Nov 20 '15 at 15:51
  • Does this actually work for the case where a letter is removed from one of the words? See his "Orange" above - it shows that only 1 letter is expected to be 'taken off' from 100%. Under your method, wouldn't it count 4 differences, starting from the missing letter? – Grade 'Eh' Bacon Nov 20 '15 at 15:58
0

Import this VBA module: https://github.com/kyledeer-32/vba_fuzzymatching

It contains several Fuzzy Matching UDFs - the one your specifically looking for is =String_Similarity. It will give you the match percentage between two strings. You can also use =Fuzzy_Match for matching one string (one cell) to an array of cells (one column).

Here are my results using applying this VBA module to your example data:

enter image description here

Here is a formula view:

enter image description here

Note: after importing this module, you will need to enable the "Microsoft Scripting Runtime" library in the Visual Basic Editor Window it to run. Steps to do this (takes less than a minute):

From Excel Workbook:

  • Select Developer tab on ribbon
  • Select Visual Basic
  • Select Tools on the Toolbar
  • Select References
  • Scroll down until you see Microsoft Scripting Runtime, then check the box
  • Press OK
Kyle Deer
  • 91
  • 8