0

I am creating a macro to search word documents for exact matches against acronyms in an Excel file. If the acronym is in the Word file, the macro highlights the acronym and inserts the row number into an array for use in a to-be-written macro to generate a Table of Acronyms.

The below macro works, however there are several false positives whenever I run it. This occurs when certain acronyms contain special characters, notably "&", "/" and "-".

For example, if I run the below macro on a file that contains RT&E, the code will insert the row number for "RT and "RT&E" and "T&E" into the array (provided all three are in the first column in the excel file).

This is not a problem on small documents, but for 150 page documents, it's just too much. I also apologize for the bad code. Suggestions to make it better are appreciated.

    Dim rng As range
    Dim i As Long
    Dim acro As String
    Dim acrolist As Excel.Application
    Dim acrobook As Excel.Workbook
    Dim acromatch() As Variant

    ReDim acromatch(0 To 1)

    Set acrolist = New Excel.Application
    Set acrobook = acrolist.Workbooks.Open("P:\AcronymMacro\MasterAcronymList.xlsm")

        ' Count from first row with acronym to maximum # of rows
        ' That way, list can be as long or short as needed

        For i = 3 To 1048576
        Set rng = ActiveDocument.range
        acro = acrobook.Sheets(1).Cells(i + 1, 1)

        ' Loop breaks when it finds an empty cell
        ' i.e. the last acronym in the document.

        If acro = "" Then Exit For

        ' Find and Replace code

        With rng.Find
        .Text = acro
        .Format = True
        .MatchCase = True
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False

        ' Do While loop            

        Do While .Execute(Forward:=True) = True
        rng.HighlightColorIndex = wdPink

        Call InsertIntoArray(acromatch(), i + 1)

        Loop

        End With
        Next

    MsgBox Join(acromatch(), ",")

    'Make sure you close your files, ladies and gentlemen!

    acrobook.Close False
    Set acrolist = Nothing
    Set acrobook = Nothing

   ' This function resizes array and insert value as last value

    Public Function InsertIntoArray(InputArray As Variant, Value As Variant)

     ReDim Preserve InputArray(LBound(InputArray) To UBound(InputArray) + 1)
     InputArray(UBound(InputArray)) = Value

    End Function

One thing I tried was to run another Range.Find method in the Do While Loop, with a slight change to the acronym. For instance the below code makes sure there is a space, period, or close parentheses and not an ampersand and hyphen after the acronym. If it is different, then it doesn't get added.

   Do While .Execute(Forward:=True) = True
        rng.HighlightColorIndex = wdPink
        acro = acro + "[ .)]"
        With rng.Find
          .Text = acro
          .MatchWildCards = True
        If rng.Find.Execute(Forward=True) = True Then Call InsertIntoArray(acromatch(), i + 1)
        End With
        Loop

This code, however, makes sure nothing gets into the array.

How to I present false positives when acronyms have special characters in acronyms?

dockert
  • 1
  • 3
  • A bit ugly but you *could* - with a throwaway version of the Word document - replace all instances of "&" with (eg) "aaaaa", "/" with "bbbbb", etc (anything unlikely to be in the document) Do the same with any acronym from Excel before running the search: it will then only find whole-word "matches". Like I said, pretty hacky but should meet your requirements. – Tim Williams Aug 10 '17 at 00:48
  • the problem is that msWord considers "RT&E" to be 3 words. single-step these three lines and watch your word document while single-stepping `For Each wrd In ActiveDocument.Words` `wrd.Select` `Next wrd` – jsotola Aug 10 '17 at 02:09
  • maybe the excel sheet needs to be sorted so that RT&E is found first and then RT is not counted if the color of the text is already pink – jsotola Aug 10 '17 at 02:14
  • maybe _Regex_ is the way to go. some info: https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops#22542835 – jsotola Aug 10 '17 at 02:17

1 Answers1

0

here is a rewrite of your code

it puts the data from excel into an array, then the array is searched

no correction made for the problem with special characters

Sub acroTest()

    Dim acromatch() As Variant
    ReDim acromatch(0 To 1)

    Dim acrolist As Excel.Application
    Set acrolist = New Excel.Application

    Dim acrobook As Excel.Workbook
    Set acrobook = acrolist.Workbooks.Open("P:\AcronymMacro\MasterAcronymList.xlsm")

    Dim rng As Range                          ' msWord range
    Set rng = ActiveDocument.Range

    With rng.Find                             ' set up find command
        .Format = True                        ' these are "remembered" until changed
        .MatchCase = True                     ' same as the "find" dialog box
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With

    ' Count from first row with acronym to maximum # of rows
    ' That way, list can be as long or short as needed

    ' could loop excel range this way
    '    constant xlCellTypeConstants = 2
    '    Dim acro As Excel.Range
    '    For Each acro In acrobook.Sheets(1).Range("a3:a1048576").SpecialCells(xlCellTypeConstants) ' all non-blank, non-formula cells

    Dim acro As Excel.Range
    Set acro = acrolist.Range(acrobook.Sheets(1).Range("A3"), acrobook.Sheets(1).Cells(1048576, "A").End(xlUp))    ' range A3 to last used cell in A column

    Dim wordsInExcel As Variant                            ' column A gets put into an array for faster execution

    wordsInExcel = acro.Value                              ' convert excel range to 2d array (1 x N)
    wordsInExcel = acrolist.Transpose(wordsInExcel)        ' convert result to 2d array (N x 1)
    wordsInExcel = acrolist.Transpose(wordsInExcel)        ' convert again to get 1d array

    Dim i As Long
    For i = 1 To UBound(wordsInExcel)

        rng.Find.Text = wordsInExcel(i)                    ' this is "search text"

        Do While rng.Find.Execute(Forward:=True) = True    ' do the actual search
            rng.HighlightColorIndex = wdPink
            Call InsertIntoArray(acromatch(), i + 1)
        Loop

    Next

    MsgBox Join(acromatch(), ",")

    ' Make sure you close your files, ladies and gentlemen!

    acrobook.Close False
    Set acrolist = Nothing
    Set acrobook = Nothing

End Sub

' This function resizes array and insert value as last value

Public Function InsertIntoArray(InputArray As Variant, Value As Variant)
    ReDim Preserve InputArray(LBound(InputArray) To UBound(InputArray) + 1)
    InputArray(UBound(InputArray)) = Value
End Function
jsotola
  • 2,238
  • 1
  • 10
  • 22
  • This doesn't address the actual question – Tim Williams Aug 10 '17 at 01:51
  • @TimWilliams, OP was also asking for suggestions to make code better. this one speeds it up because it needs to get data from excel only once. no way to post code in a comment – jsotola Aug 10 '17 at 02:04
  • Thank you for the response. Speed of the macro was a concern. I can't get the above code to work. I resolved a "method not found" by changing "Application" to "Excel.Application". I resolved a "Type Mismatch 4218" by changing "Set acro = range..." to "Set acro = acrolist.range", but I get a "Type mismatch 9 (subscript not found)" when the Find text begins. – dockert Aug 10 '17 at 19:41
  • @dockert, sorry about the errors. please insert `debug.print wordsInExcel(i)` before the `rng.Find.Text ... ` line. and run the code. let me know the result – jsotola Aug 10 '17 at 23:21
  • @jsotola The code does not run unless I comment out the second ".Application.Transpose". After I did this, I inserted 'debug.print wordsInExcel(i)' as you directed. The immediate window prints out the correct acronyms. The code doesn't quite work as only a handful of the acronyms end up in the array. I really appreciate the help. – dockert Aug 11 '17 at 16:52
  • @jsotola The code bug was due to the search text changing the range from the entire document to just the found word. I inserted "Set rng = ActiveDocument.range" after the call to the InsertIntoArray function, and the array fills up. – dockert Aug 11 '17 at 17:19