0

There is a document like this one. I process 20 documents like this every day and they all look the same (structure, I mean, is very consistent).

enter image description here

The goal of this macro is to extract ONLY THE FIRST match of the RegEx pattern from the .ActiveDocument.Content. In the whole doc there is many more matches, but I need only the first one. The document being processed will be manually opened before the macro would run.

I'm just a VBA beginner so if there is a possibility to write it without using arrays, collections or some dictionaries I'd much appreciate. There is just one item to extract, so it's best to load it inside repNmbr string variable and from there just ws.Range("G30").Value = repNmbr. The simpler the better.

I used these resources Excel Regex Tutorial (Regular Expressions) which is very helpful but I still don't know how to load the FIRST MATCH alone into my repNmbr string variable. I'd like to do this without using any loop, because I just want to load a single string into this repNmbr variable.

Currently I have code like this:

Sub ExtractRepertor03()
    'Application.ScreenUpdating = False
    Dim WordApp As Word.Application
    Dim WordDoc As Word.Document
    Dim ExcelApp As Excel.Application
    Dim rng As Word.Range
    Dim ws As Worksheet
    Dim regEx As Object
    Dim matches As MatchCollection
    Dim match As String
    Dim repNmbr As String

    'Assigning object variables
    Set WordApp = GetObject(, "Word.Application")      'ActiveX can't create object is when
    Set ExcelApp = GetObject(, "Excel.Application")    'there is no Word document open;
    Set regEx = CreateObject("VBScript.RegExp")
    Set WordDoc = WordApp.ActiveDocument
    Set rng = WordApp.ActiveDocument.Content

    'Create the regular expression object
    regEx.Global = False    'because I need only the first match instead of all occurences;
    regEx.IgnoreCase = True
    regEx.Pattern = "([0-9]{1,5})([ ]{0,4})([/])([0-9]{4})"
    'regEx.Pattern = "([0-9]{1,5})([\s]{0,4})(/[0-9]{4})"

    repNmbr = regEx.Execute(rng.text)      'here is something wrong but I don't know what;
                            'I'm trying to assign the first RegEx match to repNmbr variable;
    Debug.Print repNmbr
    repNmbr = Replace(repNmbr, " ", "") 
'    Set matches = regEx.Execute(rng.text)
'    Debug.Print regEx.Test(rng)
'    'Debug.Print regEx.Value
'        For Each match In matches    'I just want this macro run without the loop
'           Debug.Print match.Value   'Result: 9042 /2019
'           repNmbr = match.Value
'        Next match

    ExcelApp.Application.Visible = True
    ws.Range("G30").Value = repNmbr
End Sub

And an error like this:

enter image description here

Can someone explain to me why Set matches = regEx.Execute(rng.text) works fine but repNmbr = regEx.Execute(rng.text) returns the error: "Wrong number of arguments or invalid property assignment"??

After regEx.Global = False is set, the RegEx finds only a single value, so why VBA refuses to assign this string into the repNmbr string variable??

michal roesler
  • 479
  • 2
  • 9
  • 26
  • 1
    Try using of `Set repNmbr = regEx.Execute(rng.text)` but after declaring it in this way: `Dim repNmbr As MatchCollection`. The first occurrence would be `repNmbr(1)`... – FaneDuru Feb 16 '20 at 15:32
  • For your luck, I am Polish, and I warn you: this contains very sensitive personal data and you could be prosecuted for publishing it like that. Try figuring out just a sample for presenting what is your problem. – Michał Turczyn Feb 16 '20 at 15:35
  • Or try `repNmbr = regEx.Execute(rng.text) (0)` keeping your initial declaration. Or `repNmbr = regEx.Execute(rng.text).allMatches.Item(0).submatches.Item(0)`. I must tell you that I do not have any possibility to test it... – FaneDuru Feb 16 '20 at 15:51
  • 1
    First: do ***not*** try to use RegEx to extract information from Word. Use Word's built-in **Find** functionality. To get the basic syntax, first experiment with Find as an end-user (Ctrl+H, then select the Find tab, click "More" to see all possible options). Take a close look at "wildcards", which is Word's built-in "reg ex". If you have trouble working out the Find term, ask in an end-user venue such as Super User. Then record a macro to get the basic syntax needed for the VBA code. To get the first match, set the `Forward` property to True (it's the default). – Cindy Meister Feb 16 '20 at 16:10
  • Also, stick to one question, don't repeat your question in the hopes of getting different answers. One of the answers there provided a wildcard suggestion - work with that person: [RegEx find string containing numbers with one space, two spaces or no space at all followed by / and four numbers](https://stackoverflow.com/questions/60233258/regex-find-string-containing-numbers-with-one-space-two-spaces-or-no-space-at-a) – Cindy Meister Feb 16 '20 at 16:13
  • @Michał Turczyn Thank you Michał for your concern. Every name, PESEL and address in the screen shot you're referring to is fake. You're Polish so you know that Polish citizens rarely have 3 names and never have 3 surnames. – michal roesler Feb 16 '20 at 17:50
  • In my opinion this is a different question from https://stackoverflow.com/questions/60233258/regex-find-string-containing-numbers-with-one-space-two-spaces-or-no-space-at-a?noredirect=1#comment106555506_60233258 In this question I'm further into the code, I have the match and I'm trying to simplify the code which is shown on other resources, containing tools (like arrays, collections or dictionaries), which I think are unnecessary to pass this single value to the ```repNmbr``` variable. I don't understand why the var needs to be dimed as ```As MatchCollection``` as it only stores 1 string. – michal roesler Feb 16 '20 at 18:07
  • Or maybe, if the conditions I proposed for the answer are absurd, maybe it can be done using ```For Each Match In theMatches Debug.Print Match.Value Next Match``` to populate the collection with all the matches from whole document and then just assign the first match to the ```repNmbr``` String variable. It's just my first time using RegEx and it's hard for me to grasp it. I don't need all matches. I just need the first one to assign it to the string variable but I've never done this before and I hoped This is the place to ask :-) Love U all guys. – michal roesler Feb 16 '20 at 19:00
  • 1
    It must be dimed as `As MatchCollection` because this variable type is returned by the function in discussion. Even if there is only one occurrence. Or you can use it in the second way ai recommended: `repNmbr = regEx.Execute(rng.text) (0)`. Which means the first element of the returned collection... And if you address to someone here, it is recommended to use @FaneDuru (in my case). You can see our comments because you are the one asking the question... – FaneDuru Feb 16 '20 at 19:41
  • 1
    I wanted to refer at notifications receiving... – FaneDuru Feb 16 '20 at 19:49
  • «In my opinion this is a different question». In reality, though, it's nothing more than an *extension* of that other question... – macropod Feb 16 '20 at 23:47

1 Answers1

1

As I said in your other question, you don't need the RegEx library for this. Stick to Word's wildcards! Try:

Sub Demo()
Application.ScreenUpdating = False
Dim WordApp As Word.Application
Set WordApp = GetObject(, "Word.Application")
With WordApp.ActiveDocument.Range
  With .Find
    .Text = "<[0-9 ]{1,7}/[0-9]{4}>"
    .MatchWildcards = True
    .Wrap = wdFindStop
    .Forward = True
    .Execute
  End With
  If .Find.Found = True Then ActiveSheet.Range("G30").Value = Replace(.Text, " ", "")
End With
Application.ScreenUpdating = True
End Sub

Note: I haven't bothered with any of:

Dim ExcelApp As Excel.Application
Dim rng As Word.Range
Dim ws As Worksheet
Dim regEx As Object
Dim matches As MatchCollection
Dim match As String
Dim repNmbr As String

as it's all superfluous - even your own code never assigns anything to ws.

macropod
  • 12,757
  • 2
  • 9
  • 21