0

I want to automate a script in Excel VBA and I am stuck.

I have a string = "The patient population is x. From this, a lot of patients are males. A particular male patint has 3 deadly diseases." (the strings will be longer)

Now what i want to do is search how many times the word "patient" appears in this string even if the words have spell problems and how are they written.

My ideea was to let`s say match with a confidence of 80% the word "patient" with all the words in the string and the result i am aiming for is .. there are 3 matches and the words that are in the string :"patient","patients","patint ". Is there a way to do this?

Nick Dragosh
  • 495
  • 1
  • 9
  • 21
  • You could make a dictionary of the variations of spelling and then for each word in the string check it against the array of potential matches. – Ad Wicks Jul 15 '15 at 10:35

4 Answers4

2

YMMV of course but two things to look at are:

Fuzzy Lookup Add-In for Excel

... performs fuzzy matching of textual data in Microsoft Excel. It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data.

Calculating the Levenshtein Distance may also be useful.

Community
  • 1
  • 1
Alex K.
  • 171,639
  • 30
  • 264
  • 288
1

Here is a VBA implementation of the Levenshein Distance. You can adjust the threshold to fit your needs.

Public Function Levenshtein(str1 As String, str2 As String) As Integer
On Error GoTo ErrHandler
    Dim arrLev, intLen1 As Integer, intLen2 As Integer, i As Integer
    Dim j, arrStr1, arrStr2, intMini As Integer

    intLen1 = Len(str1)
    ReDim arrStr1(intLen1 + 1)
    intLen2 = Len(str2)

    ReDim arrStr2(intLen2 + 1)
    ReDim arrLev(intLen1 + 1, intLen2 + 1)

    arrLev(0, 0) = 0
    For i = 1 To intLen1
        arrLev(i, 0) = i
        arrStr1(i) = Mid(str1, i, 1)
    Next

    For j = 1 To intLen2
        arrLev(0, j) = j
        arrStr2(j) = Mid(str2, j, 1)
    Next

    For j = 1 To intLen2
        For i = 1 To intLen1
            If arrStr1(i) = arrStr2(j) Then
                arrLev(i, j) = arrLev(i - 1, j - 1)
            Else
                intMini = arrLev(i - 1, j) 'deletion
                If intMini > arrLev(i, j - 1) Then intMini = arrLev(i, j - 1) 'insertion
                If intMini > arrLev(i - 1, j - 1) Then intMini = arrLev(i - 1, j - 1) 'deletion

                arrLev(i, j) = intMini + 1
            End If
        Next
    Next

    Levenshtein = arrLev(intLen1, intLen2)
    Exit Function

ErrHandler:
    MsgBox Err.Description
    Exit Function
End Function
Crowcoder
  • 11,250
  • 3
  • 36
  • 45
0

You could use the Soundex2 algorithm to match similar-sounding words. This SO post has some pointers on soundex in VBA.
Note that the algorithm relies on characteristics predominantly found in English.

Community
  • 1
  • 1
user1016274
  • 4,071
  • 1
  • 23
  • 19
0

The concept you are looking for is called "full text search".

I'm not a 100 % sure, but I think that is it not native in Excel nor VBA. To my best knowledge not even MS Access supports this.

Check out the Add-In suggested by Alex K or look at embedding a real database in your app.

citywall
  • 235
  • 2
  • 11