0

Background:

As part of a project I am working on which involves fuzzy string matching, I have implemented the Levenshtein Distance algorithm in VBA to calculate the "similarity" between two strings (see this question for part of the code/more insight into my project).

So, I created a table in Sheet1 in Excel with row and column headers which are strings (located in cells A2:A2146 and B1:TU1, respectively), and I am comparing these strings with the LevenshteinDistance function. The function populates the empty cells in the table (in my case, B2:TU2146) with what I call the matchScore. The idea is this: the more similar two strings are, the lower their matchScore. It follows that if two strings match exactly, we would have matchScore = 0.

(1) More specifically, suppose the value of S1 (one of my column headers) is "recursion" and the value of cell A532(one of my row headers) is "recursion". After executing my "similarity" function, the value returned in cell S532 of the table is 0.

What I want to achieve:

For the purposes of my problem and the heuristic I've defined to measure string similarity, I am particularly interested in string pairs for which matchScore <= 1 is true (this includes the example (1) above).

The data table is huge, and it is difficult for me to get visibility into the "good data" (matchScore <= 1). Thus, I want Excel to find each value in the table which is <=1 and output them in Sheet2 along with the strings that were paired off as "good matches." Therefore, there should be three columns of data in Sheet2. To reference again the example above (1), when my code finishes running, I should see 0, "recursion", and "recursion" in cells A1 through C1 (assuming this was the only "good match" I found in the table).

What I have tried to implement as a solution:

Sub FindMatches()

Dim r As Long, c As Range
r = 1

For Each c In Range("B2:BY2146").Cells
    If c.Value <= 1 Then Sheets("Sheet2").Cells(r, 1).Resize(1, 3).Value = Array(c.Value, Cells(c.Row, 1), Cells(1, c.Column))
    r = r + 1
Next c

End Sub

Effectively, this sub does nothing. Where am I wrong in the way I am attempting to implement my solution, and what can I do to solve this issue?

larryltj
  • 15
  • 7
  • Any reason you don't simply sort ascending on the result, so that the strongest matches float to the top, and the weakest to the bottom? – jeffreyweir Nov 08 '17 at 03:32
  • @jeffreyweir doing that wouldn't necessarily improve my visibility into the data just by nature of the strings I'm comparing, but it could be useful in the future! Is there a way I can build that into my sub? I have tried to do this here: `Sub Sort_Data_Ascending() Sheets("Sheet2").Range("A:C").Sort _ Key1:=Sheets("Sheet2").Range("A1") _ Order1:=xlAscending End Sub` But I have a syntax error somewhere. Any help? Am new to VBA. – larryltj Nov 08 '17 at 16:47

2 Answers2

0

Here's an updated sub:

Sub FindMatches()
    On Error GoTo errHandler

    Dim r As Long, c As Range

    Application.ScreenUpdating = False

    With Sheets("Sheet2")
        r = 1
        For Each c In Range("B2:BY2146").Cells
            If c.Value <= 1 Then
                .Cells(r, 1).Resize(1, 3).Value = Array(c.Value, Cells(c.Row, 1).Value, Cells(1, c.Column).Value)
                r = r + 1
            End If                
        Next c
    End With

Recover:
    On Error Resume Next
    Application.ScreenUpdating = True
    Exit Sub

errHandler:
    MsgBox Err.Description, vbExclamation + vbOKOnly, "Error"
    Resume Recover
End Sub

Note that Array returns a one-dimensional array, whereas .Value, when assigned an array, expects a 2D one. So I've split the assignments across 3 lines of code.

EDIT To my surprise, assigning a 1D, zero-based array to the .Value property of a single-row range works beautifully, whereas I thought a 2D, 1-based was a requirement. So my initial paragraph above is baloney and @Profex really found the issue.

A With block provides a bit more performance, and the Application.ScreenUpdating management provides lots more. It is important to reset Application.ScreenUpdating to True in case of error.

Beware of unqualified references, i.e. Range and Cells not preceded by their parent object (e.g. ... In Range("B2:BY2146").Cells and Cells(c.Row, 1); those are looking at whichever worksheet is the active one when they're invoked. In your case, if the source values are on Sheet1, you could use e.g. ... In Sheets("Sheet1").Range("B2:BY2146").Cells and Sheets("Sheet1").Cells(c.Row, 1).

Excelosaurus
  • 2,789
  • 1
  • 14
  • 20
0

Qualify sheets("sheet1") in the assignment statement & loop. Put r=r+1 inside the if statement.

Sub FindMatches()

Dim r As Long, c As Range
r = 1

For Each c In Sheets("Sheet1").Range("B2:BY2146").Cells
If c.Value <= 1 Then
    Sheets("Sheet2").Range(Cells(r, 1), Cells(r, 3)).Value = Array(c.Value, Sheets("sheet1").Cells(c.Row, 1), Sheets("sheet1").Cells(1, c.Column).Value)
    r = r + 1
End If
Next c
End Sub
Carol
  • 471
  • 4
  • 7