0

It's a follow up for this question and a print screen embedded there with document view still applies. The code is run from Excel VBA editor.

There is unknown number of email addresses in the Word document and:

  1. I need to extract all of them,

  2. concatenate into one string that contains all of the email addresses, separated using ", "

  3. and fill the string into Excel cell Activesheet.Range("C31")

Currently I have a code that finds the @ sign and builds email address around that. This is how it looks like:

Sub FindEmail036()         '[0-9;A-z;,._-]{1;}\@[0-9;A-z;._-]{1;}
                           '[0-9;A-z;,._-]{1;}\@[0-9;A-z;._-]{1;}
Dim WordApp As Word.Application
Dim WordDoc As Word.Document
Dim ExcelApp As Excel.Application
Dim rng As Word.Range
Dim emailAdr As String
Dim ws As Worksheet
Dim iCount As Integer

Set WordApp = GetObject(, "Word.Application")
Set ExcelApp = GetObject(, "Excel.Application")
Set WordDoc = WordApp.ActiveDocument
Set rng = WordApp.ActiveDocument.Content
Set ws = ExcelApp.ActiveSheet

ExcelApp.Application.Visible = True

    With rng.Find
        .Text = "@"
        .Wrap = wdFindContinue
        .Forward = True
        .MatchWildcards = False
        .Execute

        Debug.Print rng.Text
        If .Found = True Then
            rng.MoveStartUntil Cset:=" ", Count:=wdBackward
            Debug.Print rng.Text
            rng.MoveEndUntil Cset:=","
            Debug.Print rng.Text
            'rng.MoveEndUntil Cset:=" ", Count:=wdBackward
        End If
    End With     'how to create loop that will extract all the email addresses in the document??
    ws.Range("C31").Value = rng
End Sub

This code only extracts the first email address and is not looking for next email addresses. I know this because I'm debugging using F8 and Immediate window and I can see, this code is just finishing search after it finds @ and constructs first complete email address.

I guess some loop is necessary but I don't know how to do write it.

I've also found this source but I don't understand much from it. https://wordmvp.com/FAQs/MacrosVBA/NoTimesTextInDoc.htm

michal roesler
  • 479
  • 2
  • 9
  • 26
  • Extract, but then what are you looking to do with each address? – Joey Feb 10 '20 at 15:53
  • I need to concatenate all email addresses into one string that contains all of the email addresses, separated using ", " and fill the string into one Excel cell Activesheet.Range("C31"). – michal roesler Feb 10 '20 at 16:42
  • Re using RegEx to find email addresses, [you might find this of use](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) – chris neilsen Feb 11 '20 at 01:06

4 Answers4

1

I recommend using regular expressions.

Check Reference: Microsoft VBscript Regular Expressions X.X

Sub FindEmail()
    Dim WordApp As Word.Application
    Dim WordDoc As Word.Document
    Dim ExcelApp As Excel.Application

    Dim StrInput As String, sPattern As String
    Dim oEmail As MatchCollection
    Dim Ws As Worksheet
    Dim vR()
    Dim n As Long, i As Long

    Set WordApp = GetObject(, "Word.Application")
    Set ExcelApp = GetObject(, "Excel.Application")
    Set WordDoc = WordApp.ActiveDocument

    StrInput = WordDoc.Content
    Set Ws = ExcelApp.ActiveSheet

    sPattern = "([A-z0-9.]{1,})(@)([A-z0-9]{0,})(.)([A-z0-9]{1,})"

    Set oEmail = GetRegEx(StrInput, sPattern)
    For i = 0 To oEmail.Count - 1
        n = n + 1
        ReDim Preserve vR(1 To n)
        vR(n) = oEmail.Item(i)
    Next
    'Ws.Range("c31").Resize(n) = WorksheetFunction.Transpose(vR)
    Ws.Range("c31") = Join(vR, ", ") '<~~ single string
End Sub
Function GetRegEx(StrInput As String, strPattern As String) As Object
    Dim RegEx As New RegExp
    Set RegEx = New RegExp
    With RegEx
        .Global = True
        .IgnoreCase = False
        .MultiLine = True
        .Pattern = strPattern
    End With
    If RegEx.Test(StrInput) Then
        Set GetRegEx = RegEx.Execute(StrInput)
    End If
End Function

Your word document has multiple lines, so I set mutiline = true in the regex setting. The regular expression therefore stores all of its contents in matchcollection. Put this stored item in a dynamic array and do the next thing. You can store an array in multiple cells, or create a single character using the join function.

enter image description here

Dy.Lee
  • 7,527
  • 1
  • 12
  • 14
  • Thank you for this answer. Can you elaborate on why do I need this function you wrote (I didn't expect a function as an answer to my question) and what is it doing? I was aware that RegExp could be use to help in extracting those mails but I was expecting that the macro could also concatenate the retrieved email addresses into one single string with ```", "``` as delimiter. – michal roesler Feb 10 '20 at 19:10
1

It's finishing because the way Range.Find works is that it sets the range equal to what it finds. So it finds the @, sets the range equal to it, and now there are no more @ in the range. You need another range to manipulate, because manipulating your search range will only screw up your results.

You can loop with a Do While .Found = True (my preferred method). Make sure that you set .Wrap = wdFindStop or you will have an infinite loop.

I'd put the results in a dictionary.

Dim eAddresses As Object: Set eAddresses = CreateObject("Scripting.Dictionary")

Dim rng As Range
Set rng = ActiveDocument.Content
Dim srchRng As Range

Dim addressNum As Long
addressNum = 1

With rng.Find
        .Text = "@"
        .Wrap = wdFindStop
        .Forward = True
        .MatchWildcards = False
        .Execute
        Debug.Print rng.Text
        Do While .Found

            Set srchRng = rng.Duplicate
            srchRng.MoveStartUntil Cset:=" ", Count:=wdBackward
            Debug.Print srchRng.Text
            srchRng.MoveEndUntil Cset:=","

            If Not eAddresses.Exists(srchRng.Text) Then
                eAddresses.Add srchRng.Text, addressNum
                addressNum = addressNum + 1
            End If
            .Execute
        Loop
    End With

End Sub

As a side note, when you push these to production, I'd definitely pull out all the Debug.Print statements. It makes for a cluttered immediate window, especially if you plan on printing useful metrics and/or errors to the immediate window (which I recommend).

jclasley
  • 668
  • 5
  • 14
  • Okey. I've checked it and it works. Can you please tell me what should I put into cell ```Activesheet.Range("C31").Value = ???``` I'll read about this "dictionary" tomorrow but can I write this dictionary into the ```Range("C31")```. Please provide the last, input into the Excel cell, part of the code. – michal roesler Feb 10 '20 at 19:52
  • 1
    Gonna have to do a for loop and iterate through the range with your values – jclasley Feb 10 '20 at 20:17
  • 1
    If you are using a scripting dictionary then you can compile the values (Items) of the dictionary to a single string using Join(myDic.Items, ",") – freeflow Feb 11 '20 at 11:25
  • 1
    @jclasley A useful trick to remember when populating scripting dictionaries is to use the length of the dictionary to generate the key for the next item. e.g. with myDic: .add .Count, Item: End with This removes the need for a separate variable for the Key. – freeflow Feb 11 '20 at 11:47
  • @Freeflow I was using the email address as the key so as to avoid duplicate values, though your suggestion works either way! – jclasley Feb 11 '20 at 13:31
  • Again and again I'm coming back to your answers @jclasley. I'm using different parts of your code for different macros I'm writing. Today I'm here taking ```Set srchRng = rng.Duplicate``` and the whole duplicate rng idea. Thank you for this answer and for the guidance and explanation in the beginning of your answer. – michal roesler Feb 18 '20 at 14:04
  • 1
    @michalroesler Thanks for the update, I'm glad that I'm able to help you! – jclasley Feb 19 '20 at 16:11
1

Other responders have identified the cause of your problem so I won't reiterate that. However, your requirement is a common pattern in VBA/Word, namely find something then do something as a consequence of the find (other than a replace). I generally wrap this pattern in a function or sub depending upon what action is required once the find text has been found..

If you haven't used a scripting.dictionary before than I would use early binding (as in the code below) so that you get access to intellisense for the methods and properties. This means using Tools.Reference to add the Microsoft Scripting.Runtime library to the VBIDE.

You'll see that we recalculate the end of the document each time we run through the While loop. This is good practise because we don't know in advance the impact that the find actions will have on the length of the document.

The DoEvents in the While loop ensures that you can quickly break out of the loop if things go wrong.

The function below uses a Word wildcard search to search for email addresses. The find is precise so there is no need to adjust the ends of the found range to get only the email address.

If the action in the found do loop was complicated then I would break this out to a separate function passing the found range to the function as .Duplicate. In this particular case that would also mean that I would move the scripting dictionary from a local variable to a module scope variable

Public Function GetEmailAddressesAsString(ByVal ipDoc As Word.Document) As String

    Const EmailAddress As String = "<[0-9A-Za-z._]{1,}\@[0-9A-Za-z.\_]{1,}>"

    With ipDoc.StoryRanges(wdMainTextStory)

        With .Find

            .ClearFormatting
            .Wrap = wdFindStop
            .MatchWildcards = True
            .text = EmailAddress

        End With

        Dim myAddresses As Scripting.Dictionary
        Set myAddresses = New Scripting.Dictionary

        Do While .Find.Execute

            DoEvents
            myAddresses.Add myAddresses.Count, .text
            .MoveStart Count:=.Characters.Count + 1
            .End = ipDoc.StoryRanges(wdMainTextStory).End

        Loop

    End With

    GetEmailAddressesAsString = Join(myAddresses.Items, ",")

End Function
freeflow
  • 4,129
  • 3
  • 10
  • 18
1

I already effectively answered this in your other thread:

Sub Demo()
Dim wdApp As Word.Application, StrOut As String
Set wdApp = GetObject(, "Word.Application")
With wdApp.ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "<[0-9A-ÿ.\-]{1,}\@[0-9A-ÿ\-.]{1,}([^13 -/\:-\@\\-`\{-¿])"
    .Replacement.Text = ""
    .Forward = True
    .Format = False
    .Wrap = wdFindStop
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found
    StrOut = StrOut & Trim(.Text) & " "
    .Collapse wdCollapseEnd
    .Find.Execute
  Loop
End With
StrOut = Replace(Trim(StrOut), " ", ", ")
ActiveSheet.Range("C31").Value = StrOut
End Sub

Note how little differs between this code and the code I posted in your other thread.

macropod
  • 12,757
  • 2
  • 9
  • 21
  • Thank you for that answer. Can You please tell me in this string you proposed, ```"<[0-9A-?.\-]{1,}\@[0-9A-?\-.]{1,}([^13 -/\:-\@\\-`\{-?])"``` the first sign ```<``` what it stands for? And in the last brackets what ```^13``` stands for? – michal roesler Feb 13 '20 at 11:02
  • 1
    I've edited the Find expression to correct errors introduced into it by the forum software, which doesn't like some high-end ASCII characters being pasted in. To understand what the expression does, you need to learn about wildcards in Word. See: https://wordmvp.com/FAQs/General/UsingWildcards.htm – macropod Feb 13 '20 at 11:08
  • It's different from other RegExp strings used to find email addresses I saw on the internet. ```([^13 -/\:-\@\\-`\{-?])``` this part for example. What it does? I usually found only this: ```[0-9A-ÿ.\-]{1,}\@[0-9A-ÿ\-.]{1,}```. – michal roesler Feb 13 '20 at 11:09
  • 1
    Word's wildcards are a variation on Regular Expressions. – macropod Feb 13 '20 at 11:13
  • At the moment I'm reading this post https://www.msofficeforums.com/word-vba/13334-extract-phone-number-word-file.html on a forum where you are an Admin. Can you just explain the single line ```StrTxt = "|"``` and especially the "|" sign meaning? – michal roesler Feb 13 '20 at 12:08
  • 1
    That's just a pipe delimiter with which the StrTxt variable is pre-populated. If you want to ask questions about code on another forum, you should ask there, not here. – macropod Feb 13 '20 at 21:57