1

I have a large number of txt-files, which I want to search for specific words. My approach is using an excel macro to open the txt files in word and than search each occurence of a list of words, which I provide in the excel file. It gives me a list, of how often each word occurs in each document. I have managed to do so, using the following code:

Sub CounterofWords()

Application.ScreenUpdating = False

    Dim wdApp As Word.Application
    Set wdApp = CreateObject("Word.application")
    wdApp.Visible = False

For d = 1 To 23

    Dim wdDoc As Word.Document


    FName = "C:\Users\Andreas\Desktop\test\" & Cells(d + 1, 11) & "_htm.txt"
    On Error GoTo txtdesign
    Set wdDoc = wdApp.Documents.Open(filename:=FName)

i = 15

Do While Cells(1, i) <> ""

iCount = 0
Application.ScreenUpdating = False

With wdApp.Selection.Find
 .ClearFormatting
 .Text = Cells(1, i).Value
        Do While .Execute
            iCount = iCount + 1
            wdApp.Selection.MoveRight
        Loop
End With
Cells(d + 1, i).Value = iCount

i = i + 1
Loop





wdDoc.Close savechanges:=False
Set wdDoc = Nothing

Next d

wdApp.Quit
Set wdApp = Nothing

Application.ScreenUpdating = True

Exit Sub

txtdesign:
FName = "C:\Users\Andreas\Desktop\test\" & Cells(d + 1, 11) & "_txt.txt"
Resume

End Sub

Here you can see the relevant part of my spreadsheet, where I ran the macro for the first 23 documents.

Everything works fine so far. Now I want to be able to search for regular expressions. I need this for example to avoid certain combinations of words in my search.

It seems to be a problem that I can not write something like

With wdApp.Selection.regex

Anyways, I don't know how to make regex work in a situation like this and appreciate your help!

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Andreas K
  • 63
  • 6

1 Answers1

1

The Find method in VBA has limited pattern-matching, using this flag:

Selection.Find.MatchWildcards = True

Note: your code would not get correct results as it is, because the search for each word starts where the previous one left off in the document. You need to "move" to the top of the document for each one:

Selection.HomeKey Unit:=wdStory

But if you need more complex pattern-matching using regular expressions, you'll need a different approach, using the RegExp class, after referencing "Microsoft VBScript Regular Expressions 5.5". See a great explanation in the accepted answer to this SO question: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops.

Here's an example using regex:

Do While Cells(1, i) <> ""
    Application.ScreenUpdating = False
    Dim regEx As New RegExp
    Dim Matches As MatchCollection

    With regEx
        .Global = True
        .IgnoreCase = True
        .Pattern = Cells(1, i).Value
    End With

    Set Matches = regEx.Execute(wdDoc.Content.Text)
    Cells(d + 1, i).Value = Matches.Count
    i = i + 1
Loop
Community
  • 1
  • 1
Brian Stephens
  • 5,161
  • 19
  • 25
  • Thank you for your help so far. I need the RegExp class and already looked up the article you mentioned. Unfortunately, I can't figure out, how to make it work in my case, where I want to use regex in order to search word from a vba excel macro. – Andreas K Sep 06 '16 at 17:53
  • @AndreasK: I added some sample code using regex, that expects your regex pattern to be in the Excel sheet instead of plain-text search terms. See if this is the answer you are looking for. – Brian Stephens Sep 08 '16 at 14:43