1

I have this regex which should find a country abbreviation in an email but it doesn't work because,

I either have the wrong expression, which I don't think is the case since regex101 shows it working just fine, or
I implemented it wrong in my vba script.

                   Dim Countries As String
                    Countries = "(Nordic|F|USA|Norway|SK|Singapur|RU|China|Pakistan|Korea|Indien|India|Italien|UK|France|Deutschland|D|CZ|BLR|Schweden|Sweden|I|Tver|Minsk|HU|Russland|Frankreich|" & _
                                "AFG|ALA|ALB|DZA|ASM|AND|AGO|AIA|ATA|ATG|ARG|ARM|ABW|AUS|AUT|AZE|BHS|BHR|BGD|BRB|BLR|BEL|BLZ|BEN|BMU|BTN|BOL|BES|BIH|BWA|BVT|BRA|IOT|BRN|BGR|BFA|BDI|CPV|KHM|CMR|CAN|CPV|CYM|CAF|TCD|CHL|CHN|TWN|CXR|CCK|COL|COM|COD|COG|COK|CRI|CIV|HRV|CUB|CUW|CYP|CZE|KOR|COG|DNK|DJI|DMA|DOM|ECU|EGY|SLV|GNQ|ERI|EST|SWZ|ETH|FLK|FRO|FJI|FIN|FRA|GUF|PYF|ATF|GAB|GMB|GEO|DEU|GHA|GIB|GBR|GRC|GRL|GRD|GLP|GUM|GTM|GGY|GIN|GNB|GUY|HTI|HMD|VAT|HND|HKG|HUN|ISL|IND|IDN|IRN|IRQ|IRL|IMN|ISR|ITA|JAM|SJM|JPN|JEY|JOR|KAZ|KEN|KIR|PRK|KOR|KWT|KGZ|LAO|LVA|LBN|LSO|LBR|LBY|LIE|LTU|LUX|MAC|MKD|MDG|MWI|MYS|MDV|MLI|MLT|MHL|MTQ|MRT|MUS|MYT|MEX|FSM|MDA|MCO|MNG|MNE|MSR|MAR|MOZ|MMR|NAM|NRU|NPL|NLD|NCL|NZL|NIC|NER|NGA|NIU|NFK|MNP|NOR|OMN|PAK|PLW|PSE|PAN|PNG|PRY|PRC|PER|PHL|PCN|POL|PRT|PRI|QAT|TWN|KOR|COG|REU|ROU|RUS|RWA|ESH|BLM|SHN|KNA|LCA|MAF|SPM|VCT|WSM|SMR|STP|SAU|SEN|SRB|SYC|SLE|SGP|" & _
                                "SXM|SVK|SVN|SLB|SOM|ZAF|SGS|KOR|SSD|ESP|LKA|SDN|SUR|SJM|SWE|CHE|SYR|TWN|TJK|TZA|THA|TLS|TGO|TKL|TON|TTO|TUN|TUR|TKM|TCA|TUV|UGA|UKR|ARE|GBR|UMI|USA|VIR|URY|UZB|VUT|VEN|VNM|VGB|VIR|WLF|ESH|YEM|ZMB|ZWE\S)"
                    With Reg1
                        .Pattern = "2_a Von Firma[:].*" & """ & Countries & """ & "$"
                        'Formating in email inconsistent
                        .Global = False
                    End With
                    On Error GoTo CaseThree
                    If Reg1.Test(olMail.Body) Then
                        Set M1 = Reg1.Execute(olMail.Body)
                    End If
                    For Each M In M1
                        Debug.Print M.SubMatches(0)
                        With xExcelApp
                            Range("B6").Value = M.SubMatches(0)
                        End With
                    Next M
CaseThree:

It always gives me the last regex result, which I suspect is because it can't find the error in my if Reg1.Test(olMail.Body) Then statement

Not Paul
  • 53
  • 8

2 Answers2

1

You need to use the m flag equivalent in your VBA code, .Multiline=True, and g flag equivalent .Global=True. The code to .Execute matches if fine since you iterate over all the found matches.

However, it is also possible your regex needs a bit enhancing, I'd suggest using a lazy dot pattern and add a word boundary before the country pattern.

You do not need the triple quotes in the .Pattern definition either.

Here is a sample fix:

Countries = "\b(Nordic|F|USA|Norway|SK|Singapur|RU|China|Pakistan|Korea|Indien|India|Italien|UK|France|Deutschland|D|CZ|BLR|Schweden|Sweden|I|Tver|Minsk|HU|Russland|Frankreich|" & _
    "AFG|ALA|ALB|DZA|ASM|AND|AGO|AIA|ATA|ATG|ARG|ARM|ABW|AUS|AUT|AZE|BHS|BHR|BGD|BRB|BLR|BEL|BLZ|BEN|BMU|BTN|BOL|BES|BIH|BWA|BVT|BRA|IOT|BRN|BGR|BFA|BDI|CPV|KHM|CMR|CAN|CPV|CYM|CAF|TCD|CHL|CHN|TWN|CXR|CCK|COL|COM|COD|COG|COK|CRI|CIV|HRV|CUB|CUW|CYP|CZE|KOR|COG|DNK|DJI|DMA|DOM|ECU|EGY|SLV|GNQ|ERI|EST|SWZ|ETH|FLK|FRO|FJI|FIN|FRA|GUF|PYF|ATF|GAB|GMB|GEO|DEU|GHA|GIB|GBR|GRC|GRL|GRD|GLP|GUM|GTM|GGY|GIN|GNB|GUY|HTI|HMD|VAT|HND|HKG|HUN|ISL|IND|IDN|IRN|IRQ|IRL|IMN|ISR|ITA|JAM|SJM|JPN|JEY|JOR|KAZ|KEN|KIR|PRK|KOR|KWT|KGZ|LAO|LVA|LBN|LSO|LBR|LBY|LIE|LTU|LUX|MAC|MKD|MDG|MWI|MYS|MDV|MLI|MLT|MHL|MTQ|MRT|MUS|MYT|MEX|FSM|MDA|MCO|MNG|MNE|MSR|MAR|MOZ|MMR|NAM|NRU|NPL|NLD|NCL|NZL|NIC|NER|NGA|NIU|NFK|MNP|NOR|OMN|PAK|PLW|PSE|PAN|PNG|PRY|PRC|PER|PHL|PCN|POL|PRT|PRI|QAT|TWN|KOR|COG|REU|ROU|RUS|RWA|ESH|BLM|SHN|KNA|LCA|MAF|SPM|VCT|WSM|SMR|STP|SAU|SEN|SRB|SYC|SLE|SGP|" & _
    "SXM|SVK|SVN|SLB|SOM|ZAF|SGS|KOR|SSD|ESP|LKA|SDN|SUR|SJM|SWE|CHE|SYR|TWN|TJK|TZA|THA|TLS|TGO|TKL|TON|TTO|TUN|TUR|TKM|TCA|TUV|UGA|UKR|ARE|GBR|UMI|USA|VIR|URY|UZB|VUT|VEN|VNM|VGB|VIR|WLF|ESH|YEM|ZMB|ZWE\S)"
With Reg1
    .Pattern = "2_a Von Firma:.*?" & Countries & "$"
    'Formating in email inconsistent
    .Global = True
    .Multiline = True
End With
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I want it to give me only one match. This [Updated regex](https://regex101.com/r/EdBPvF/1) shows no matches since there isn't a country in the email resulting in the match being the previous one and therefore ignoring the `On Error`. An easy fix I suppose would be a change to the regex matching the line and returning nothing. – Not Paul Jul 22 '21 at 08:28
  • 1
    @NotPaul At any rate, you must sync the flags you use at regex101.com and the code you use, and fix the regex pattern the way I showed to match the countries where they are present. When they are missing, just add `Else` after `.Test`. – Wiktor Stribiżew Jul 22 '21 at 08:31
  • I just tried it and you can see in [this regex101](https://regex101.com/r/CMTX5W/1) that it matches the AUT correctly but if I were to delete that AUT it would match nothing, likely resulting in the match being the previous one in my script. So if there's a way to match the line without the country and Group 1 being empty it should detect it as an error and skip it. – Not Paul Jul 22 '21 at 08:38
  • 1
    @NotPaul Something like https://regex101.com/r/CMTX5W/2? – Wiktor Stribiżew Jul 22 '21 at 08:42
  • Perfect, I think that should work, provided the `On Error` returns an error on a match with no group. – Not Paul Jul 22 '21 at 08:50
  • 1
    @NotPaul I think `.Submatches(0)` will always be initialized with an empty string if there is no match. Instead of `On Error`, you could just check if it is empty or not. – Wiktor Stribiżew Jul 22 '21 at 08:51
1

Based on @Wiktor Stribiżew's solution:

                    Dim Countries As String
                        Countries = "2_a Von Firma:(?:(?!\b(?:Nordic|F|USA|Norway|SK|Singapur|RU|China|Pakistan|Korea|Indien|India|Italien|UK|France|Deutschland|D|CZ|BLR|Schweden|Sweden|I|Tver|Minsk|HU|Russland|Frankreich|" & _
"AFG|ALA|ALB|DZA|ASM|AND|AGO|AIA|ATA|ATG|ARG|ARM|ABW|AUS|AUT|AZE|BHS|BHR|BGD|BRB|BLR|BEL|BLZ|BEN|BMU|BTN|BOL|BES|BIH|BWA|BVT|BRA|IOT|BRN|BGR|BFA|BDI|CPV|KHM|CMR|CAN|CPV|CYM|CAF|TCD|CHL|CHN|TWN|CXR|CCK|COL|COM|COD|COG|COK|CRI|CIV|HRV|CUB|CUW|CYP|CZE|KOR|COG|DNK|DJI|DMA|DOM|ECU|EGY|SLV|GNQ|ERI|EST|SWZ|ETH|FLK|FRO|FJI|FIN|FRA|GUF|PYF|ATF|GAB|GMB|GEO|DEU|GHA|GIB|GBR|GRC|GRL|GRD|GLP|GUM|GTM|GGY|GIN|GNB|GUY|HTI|HMD|VAT|HND|HKG|HUN|ISL|IND|IDN|IRN|IRQ|IRL|IMN|ISR|ITA|JAM|SJM|JPN|JEY|JOR|KAZ|KEN|KIR|PRK|KOR|KWT|KGZ|LAO|LVA|LBN|LSO|LBR|LBY|LIE|LTU|LUX|MAC|MKD|MDG|MWI|MYS|MDV|MLI|MLT|MHL|MTQ|MRT|MUS|MYT|MEX|FSM|MDA|MCO|MNG|MNE|MSR|MAR|MOZ|MMR|NAM|NRU|NPL|NLD|NCL|NZL|NIC|NER|NGA|NIU|NFK|MNP|NOR|OMN|PAK|PLW|PSE|PAN|PNG|PRY|PRC|PER|PHL|PCN|POL|PRT|PRI|QAT|TWN|KOR|COG|REU|ROU|RUS|RWA|ESH|BLM|SHN|KNA|LCA|MAF|SPM|VCT|WSM|SMR|STP|SAU|SEN|SRB|SYC" & _
"|SLE|SGP|SXM|SVK|SVN|SLB|SOM|ZAF|SGS|KOR|SSD|ESP|LKA|SDN|SUR|SJM|SWE|CHE|SYR|TWN|TJK|TZA|THA|TLS|TGO|TKL|TON|TTO|TUN|TUR|TKM|TCA|TUV|UGA|UKR|ARE|GBR|UMI|USA|VIR|URY|UZB|VUT|VEN|VNM|VGB|VIR|WLF|ESH|YEM|ZMB|ZWE)).)*\b(Nordic|F|USA|Norway|SK|Singapur|RU|China|Pakistan|Korea|Indien|India|Italien|UK|France|Deutschland|D|CZ|BLR|Schweden|Sweden|I|Tver|Minsk|HU|Russland|Frankreich|AFG|ALA|ALB|DZA|ASM|AND|AGO|AIA|ATA|ATG|ARG|ARM|ABW|AUS|AUT|AZE|BHS|BHR|BGD|BRB|BLR|BEL|BLZ|BEN|BMU|BTN|BOL|BES|BIH|BWA|BVT|BRA|IOT|BRN|BGR|BFA|BDI|CPV|KHM|CMR|CAN|CPV|CYM|CAF|TCD|CHL|CHN|TWN|CXR|CCK|COL|COM|COD|COG|COK|CRI|CIV|HRV|CUB|CUW|CYP|CZE|KOR|COG|DNK|DJI|DMA|DOM|ECU|EGY|SLV|GNQ|ERI|EST|SWZ|ETH|FLK|FRO|FJI|FIN|FRA|GUF|PYF|ATF|GAB|GMB|GEO|DEU|GHA|GIB|GBR|GRC|GRL|GRD|GLP|GUM|GTM|GGY|GIN|GNB|GUY|HTI|HMD|VAT|HND|HKG|HUN|ISL|IND|IDN|IRN|IRQ|IRL|IMN|ISR|ITA" & _
"|JAM|SJM|JPN|JEY|JOR|KAZ|KEN|KIR|PRK|KOR|KWT|KGZ|LAO|LVA|LBN|LSO|LBR|LBY|LIE|LTU|LUX|MAC|MKD|MDG|MWI|MYS|MDV|MLI|MLT|MHL|MTQ|MRT|MUS|MYT|MEX|FSM|MDA|MCO|MNG|MNE|MSR|MAR|MOZ|MMR|NAM|NRU|NPL|NLD|NCL|NZL|NIC|NER|NGA|NIU|NFK|MNP|NOR|OMN|PAK|PLW|PSE|PAN|PNG|PRY|PRC|PER|PHL|PCN|POL|PRT|PRI|QAT|TWN|KOR|COG|REU|ROU|RUS|RWA|ESH|BLM|SHN|KNA|LCA|MAF|SPM|VCT|WSM|SMR|STP|SAU|SEN|SRB|SYC|SLE|SGP|SXM|SVK|SVN|SLB|SOM|ZAF|SGS|KOR|SSD|ESP|LKA|SDN|SUR|SJM|SWE|CHE|SYR|TWN|TJK|TZA|THA|TLS|TGO|TKL|TON|TTO|TUN|TUR|TKM|TCA|TUV|UGA|UKR|ARE|GBR|UMI|USA|VIR|URY|UZB|VUT|VEN|VNM|VGB|VIR|WLF|ESH|YEM|ZMB|ZWE)?$"
                    With Reg1
                        .Pattern = Countries
                        'Formating in email inconsistent
                        .Global = False
                        .MultiLine = True
                    End With
                    If Reg1.Test(olMail.Body) Then
                        Set M1 = Reg1.Execute(olMail.Body)
                    End If
                    For Each M In M1
                        Debug.Print M.SubMatches(0)
                        With xExcelApp
                            Range("B6").Value = M.SubMatches(0)
                        End With
                    Next M

Put the entire regex in Countries.
Set .Global = False, .Multiline = True

If the line contains no country M.Submatches(0) will return Empty and therefore the cell will contain no value.

Not Paul
  • 53
  • 8
  • 1
    It is good that you shared the final solution, although the regex used here is not optimized and if your cells contain very long texts with a big amount of words that start with your alternatives it might greatly decrease performance, that is why I decided not to publish this pattern in my answer. There is a way to improve performance with a regex trie, you can try the myregextester.com service to do that (see [how](https://stackoverflow.com/a/68275882/3832970)). – Wiktor Stribiżew Jul 23 '21 at 09:59