741

How can I use regular expressions in Excel and take advantage of Excel's powerful grid-like setup for data manipulation?

  • In-cell function to return a matched pattern or replaced value in a string.
  • Sub to loop through a column of data and extract matches to adjacent cells.
  • What setup is necessary?
  • What are Excel's special characters for Regular expressions?

I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations.

Andrei Konstantinov
  • 6,971
  • 4
  • 41
  • 57
Automate This
  • 30,726
  • 11
  • 60
  • 82
  • 17
    I high recommend [this VB/VBA Regexp article](http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html) by Patrick Matthews – brettdj Apr 17 '14 at 13:26
  • 1
    Try this free add-in: http://seotoolsforexcel.com/regexpfind/ – Niels Bosma Sep 23 '15 at 06:28
  • 5
    Let's not forget the `Like` [operator](https://learn.microsoft.com/en-us/dotnet/visual-basic/language-reference/operators/like-operator), which provides a sort of light version of regex-style functionality. It's typically much faster than regex, even if wrapped in a sub or function procedure. – Egalth Nov 01 '18 at 10:09

9 Answers9

1127

Regular expressions are used for Pattern Matching.

To use in Excel follow these steps:

Step 1: Add VBA reference to "Microsoft VBScript Regular Expressions 5.5"

  • Select "Developer" tab (I don't have this tab what do I do?)
  • Select "Visual Basic" icon from 'Code' ribbon section
  • In "Microsoft Visual Basic for Applications" window select "Tools" from the top menu.
  • Select "References"
  • Check the box next to "Microsoft VBScript Regular Expressions 5.5" to include in your workbook.
  • Click "OK"

Step 2: Define your pattern

Basic definitions:

- Range.

  • E.g. a-z matches an lower case letters from a to z
  • E.g. 0-5 matches any number from 0 to 5

[] Match exactly one of the objects inside these brackets.

  • E.g. [a] matches the letter a
  • E.g. [abc] matches a single letter which can be a, b or c
  • E.g. [a-z] matches any single lower case letter of the alphabet.

() Groups different matches for return purposes. See examples below.

{} Multiplier for repeated copies of pattern defined before it.

  • E.g. [a]{2} matches two consecutive lower case letter a: aa
  • E.g. [a]{1,3} matches at least one and up to three lower case letter a, aa, aaa

+ Match at least one, or more, of the pattern defined before it.

  • E.g. a+ will match consecutive a's a, aa, aaa, and so on

? Match zero or one of the pattern defined before it.

  • E.g. Pattern may or may not be present but can only be matched one time.
  • E.g. [a-z]? matches empty string or any single lower case letter.

* Match zero or more of the pattern defined before it.

  • E.g. Wildcard for pattern that may or may not be present.
  • E.g. [a-z]* matches empty string or string of lower case letters.

. Matches any character except newline \n

  • E.g. a. Matches a two character string starting with a and ending with anything except \n

| OR operator

  • E.g. a|b means either a or b can be matched.
  • E.g. red|white|orange matches exactly one of the colors.

^ NOT operator

  • E.g. [^0-9] character can not contain a number
  • E.g. [^aA] character can not be lower case a or upper case A

\ Escapes special character that follows (overrides above behavior)

  • E.g. \., \\, \(, \?, \$, \^

Anchoring Patterns:

^ Match must occur at start of string

  • E.g. ^a First character must be lower case letter a
  • E.g. ^[0-9] First character must be a number.

$ Match must occur at end of string

  • E.g. a$ Last character must be lower case letter a

Precedence table:

Order  Name                Representation
1      Parentheses         ( )
2      Multipliers         ? + * {m,n} {m, n}?
3      Sequence & Anchors  abc ^ $
4      Alternation         |

Predefined Character Abbreviations:

abr    same as       meaning
\d     [0-9]         Any single digit
\D     [^0-9]        Any single character that's not a digit
\w     [a-zA-Z0-9_]  Any word character
\W     [^a-zA-Z0-9_] Any non-word character
\s     [ \r\t\n\f]   Any space character
\S     [^ \r\t\n\f]  Any non-space character
\n     [\n]          New line

Example 1: Run as macro

The following example macro looks at the value in cell A1 to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1 values of 12abc will return abc, value of 1abc will return abc, value of abc123 will return "Not Matched" because the digits were not at the start of the string.

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1")
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.Test(strInput) Then
            MsgBox (regEx.Replace(strInput, strReplace))
        Else
            MsgBox ("Not matched")
        End If
    End If
End Sub

Example 2: Run as an in-cell function

This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:

Function simpleCellRegex(Myrange As Range) As String
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String
    
    
    strPattern = "^[0-9]{1,3}"
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

Place your strings ("12abc") in cell A1. Enter this formula =simpleCellRegex(A1) in cell B1 and the result will be "abc".

results image


Example 3: Loop Through Range

This example is the same as example 1 but loops through a range of cells.

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A5")
    
    For Each cell In Myrange
        If strPattern <> "" Then
            strInput = cell.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.Test(strInput) Then
                MsgBox (regEx.Replace(strInput, strReplace))
            Else
                MsgBox ("Not matched")
            End If
        End If
    Next
End Sub

Example 4: Splitting apart different patterns

This example loops through a range (A1, A2 & A3) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the (). $1 represents the first pattern matched within the first set of ().

Private Sub splitUpRegexPattern()
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A3")
    
    For Each C In Myrange
        strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
        
        If strPattern <> "" Then
            strInput = C.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.test(strInput) Then
                C.Offset(0, 1) = regEx.Replace(strInput, "$1")
                C.Offset(0, 2) = regEx.Replace(strInput, "$2")
                C.Offset(0, 3) = regEx.Replace(strInput, "$3")
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        End If
    Next
End Sub

Results:

results image


Additional Pattern Examples

String   Regex Pattern                  Explanation
a1aaa    [a-zA-Z][0-9][a-zA-Z]{3}       Single alpha, single digit, three alpha characters
a1aaa    [a-zA-Z]?[0-9][a-zA-Z]{3}      May or may not have preceding alpha character
a1aaa    [a-zA-Z][0-9][a-zA-Z]{0,3}     Single alpha, single digit, 0 to 3 alpha characters
a1aaa    [a-zA-Z][0-9][a-zA-Z]*         Single alpha, single digit, followed by any number of alpha characters

</i8>    \<\/[a-zA-Z][0-9]\>            Exact non-word character except any single alpha followed by any single digit
TylerH
  • 20,799
  • 66
  • 75
  • 101
Automate This
  • 30,726
  • 11
  • 60
  • 82
  • 35
    You should not forget to `Set regEx = Nothing`. You will get Out Of Memory exceptions, when that Sub is executed frequently enought. – Kiril Mar 13 '15 at 10:28
  • 1
    I adapted example 4 with [SubMatches](http://www.regular-expressions.info/vbscript.html) for treating more complex regex, basically I don't use replace when splitting, if anyone is interested: http://stackoverflow.com/questions/30218413/regex-capturing-groups-within-capture-groups – Armfoot May 13 '15 at 14:58
  • 17
    Late binding line: `Set regEx = CreateObject("VBScript.RegExp")` – ZygD Dec 05 '15 at 11:23
  • This is an excellent answer, but with the following reservations. (1) This regular expression syntax should be supported by most regular expression engines, not just the VBScript regex library. (2) It's true that Excel has a unique way (other than iterating over the objects) of calling functions (in-cell functions) for use with regular expressions, but so does [Access](http://stackoverflow.com/a/10395393/111794). – Zev Spitz Dec 18 '15 at 08:45
  • It seems that Microsoft VBScript Regular Expressions 5.5 is not available in OS X / macOS? I do not have that option available in Microsoft Office 2016 on Mac. From what I can tell, it is pretty much impossible to get native regex support in Mac versions. – user5359531 Feb 09 '17 at 17:55
  • 1
    Great answer. One improvement: I've had trouble getting the carat (`^`) to work as negation, but recently found [this answer](https://superuser.com/a/1300692/681552) which uses `?!`. This works for me. For instance, finding "nurse" but not "practitioner": `(?=.*(nurse))(?!(?=.*(practitioner)))` – DukeSilver Jul 10 '18 at 22:46
  • Can one match `we` by just using `we` and how does one match a starting and ending word boundary? I tried `\Bwe\B` with `IgnoreCase = True` but no luck. Also not sure what `Global` does. – Superdooperhero Feb 03 '19 at 15:16
  • this answer is not useful for Excel on Mac as Microsoft VBScript Regular Expressions 5.5 doesn't seem to be available. – Giacomo Feb 08 '19 at 10:44
  • @giac_man You're right, and I never claimed it was. RegExp is one of many that VBA on the Mac does not have. See [this thread about RegEx on Mac](https://stackoverflow.com/q/13995007/2521004) – Automate This Feb 08 '19 at 17:39
  • I get #NAME? error. Have pasted example 2 code into VBA Project (myfilename.xlsm) > Microsoft Excel Objects > ThisWorkbook – youcantryreachingme May 07 '19 at 01:34
  • @youcantryreachingme - What line is the error on? Are you on windows or mac? What version of Excel? Did you include the reference library in step 1? I've tested this on several systems with no errors so I'm curious of your setup. – Automate This May 07 '19 at 14:49
  • @PortlandRunner Office 365 ProPlus Excel version 1808 Build 10730.20334. Yes I went through step 1 but on re-checking the setting today, it hasn't persisted (and my OS seems to have restarted during the last 48 hours while the laptop was sleeping). Enabled that reference again now, saved, and it persists. Code still present in location described in last comment. I enter following into a cell: "=simpleCellRegex(L496)" and get #NAME?. Cell contains an email address. I updated this strPattern to match valid emails. OS Win10 Pro 64bit (10.0 Build 17134). – youcantryreachingme May 08 '19 at 22:59
  • 2
    Okay, I'm pretty sure it's because the code is in `ThisWorkbook`. Try moving the code to a separate `Module`. – Automate This May 09 '19 at 04:00
  • 4
    @PortlandRunner in the "project explorer" (?) this excel file lacked a "Modules" subfolder, although another file showed one. Right-clicked the file and chose 'insert module', then double-clicked "Module 1" and pasted the code. Saved. Back to workbook and keyed in the function again - it worked. Might be noteworthy in the answer, for the sake of the inexperienced like me? Thanks for the help. – youcantryreachingme May 10 '19 at 05:52
  • `Static` declaration might improve performance, [check the example](https://stackoverflow.com/a/34601029/2165759). – omegastripes Jul 23 '19 at 18:25
  • What I would find a valuable contribution (a suggestion) to this post is to *include* what VBA does **not** support (or the other way around: what **is** supported). For example, for some reason positive/negative lookaheads are supported while lookbehinds are not. Or would this fall outside the scope over here? @PortlandRunner – JvdV Feb 12 '20 at 12:37
  • @JvdV - I agree, that would be useful information. If you would like to post an additional answer with this valuable information I'll support it :-) – Automate This Feb 12 '20 at 15:04
  • I started to follow your answer's instructions in Excel (16.0.14026.20270) 64-bit, but I noticed after doing step one to add the VBA reference, that Excel turned off AutoSave because "Parts of your document may include personal information that can't be removed by the Document Inspector". What's this about? I'm not too worried about personal information in the first place, since I'm not sharing this document, but why does it consider the VBA reference to be "personal information? And why doesn't disabling the reference remove this warning? – Twisted on STRIKE at1687989253 Jun 19 '21 at 19:35
  • 6
    Unreal... simple indie tools like Notepad++ have a "regex" option in their Find and Replace... but in a world class tool like Excel, you have to be a programmer to do it , and in the most obscure and complicated way.. – Ciabaros Sep 16 '21 at 16:30
  • 1
    @Kiril: You do **not** need `Set regEx = Nothing`. Since `regEx` is a local variable, it is automatically freed and the reference released once the function ran to completion. – Iziminza Jul 26 '22 at 14:33
  • is there a "tab" regex ? – Alvaro Morales Aug 06 '23 at 18:30
236

To make use of regular expressions directly in Excel formulas the following UDF (user defined function) can be of help. It more or less directly exposes regular expression functionality as an excel function.

How it works

It takes 2-3 parameters.

  1. A text to use the regular expression on.
  2. A regular expression.
  3. A format string specifying how the result should look. It can contain $0, $1, $2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.

Some examples

Extracting an email address:

=regex("Peter Gordon: some@email.com, 47", "\w+@\w+\.\w+")
=regex("Peter Gordon: some@email.com, 47", "\w+@\w+\.\w+", "$0")

Results in: some@email.com

Extracting several substrings:

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")

Results in: E-Mail: some@email.com, Name: Peter Gordon

To take apart a combined string in a single cell into its components in multiple cells:

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)

Results in: Peter Gordon some@email.com ...

How to use

To use this UDF do the following (roughly based on this Microsoft page. They have some good additional info there!):

  1. In Excel in a Macro enabled file ('.xlsm') push ALT+F11 to open the Microsoft Visual Basic for Applications Editor.
  2. Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer):
    1. Click on Tools -> References (please excuse the german screenshot) Tools -> References
    2. Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it.
    3. Click OK.
  3. Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors).

    Second icon in the icon row -> Module

  4. In the big text window in the middle insert the following:

    Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
        Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
        Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
        Dim replaceNumber As Integer
    
        With inputRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = matchPattern
        End With
        With outputRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = "\$(\d+)"
        End With
        With outReplaceRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
        End With
    
        Set inputMatches = inputRegexObj.Execute(strInput)
        If inputMatches.Count = 0 Then
            regex = False
        Else
            Set replaceMatches = outputRegexObj.Execute(outputPattern)
            For Each replaceMatch In replaceMatches
                replaceNumber = replaceMatch.SubMatches(0)
                outReplaceRegexObj.Pattern = "\$" & replaceNumber
    
                If replaceNumber = 0 Then
                    outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
                Else
                    If replaceNumber > inputMatches(0).SubMatches.Count Then
                        'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
                        regex = CVErr(xlErrValue)
                        Exit Function
                    Else
                        outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
                    End If
                End If
            Next
            regex = outputPattern
        End If
    End Function
    
  5. Save and close the Microsoft Visual Basic for Applications Editor window.

Patrick Böker
  • 3,173
  • 1
  • 18
  • 24
  • 7
    This answer combined with the steps [here](http://stackoverflow.com/a/2633992/1698058) to create an Add-In, has been very helpful. Thank you. Make sure you don't give your module and function the same name! – Chris Hunt Feb 24 '15 at 19:03
  • 2
    Just reiterating the comment above from Chris Hunt. Don't call your Module 'Regex' as well. Thought I was going mad for a while as the function wouldn't work due to a #NAME error – Chris Sep 28 '15 at 14:57
  • Well, I'm gone nuts as I tried everything (including changing modules/names) and still getting the #NAME error >_> http://i.imgur.com/UUQ6eCi.png – Enissay Aug 15 '16 at 20:46
  • @Enissay: Try creating a minimal `Function foo() As Variant \n foo="Hello World" \n End Function` UDF to see if that works. If yes, work your way up to the full thing above, if no something basic is broken (macros disabled?). – Patrick Böker Aug 16 '16 at 07:27
  • I would recommend [This tool](http://blog.malcolmp.com/2010/regular-expressions-excel-add-in) it works better than the above version, there are flaws in the above version, doesn't work well with multiline matching. And it is an addin so easier to use. – Vijay Apr 27 '17 at 09:24
  • Anyone know what might have changed in three years? Today, with Excel 2003 build 12624.20382 (part of Office Pro Plus 2016 - fully updated; install in Win10 Pro OS version 1909 build 18363.720), all 5 examples above uniformly return the zero (0). No errors, just the number zero (0). Too bad regex is not a native part of such a ubiquitous application. – Dante Apr 12 '20 at 16:52
  • 2
    @Vijay: same at the https://github.com/malcolmp/excel-regular-expressions – Vadim Apr 28 '20 at 18:30
94

Expanding on patszim's answer for those in a rush.

  1. Open Excel workbook.
  2. Alt+F11 to open VBA/Macros window.
  3. Add reference to regex under Tools then References
    ![Excel VBA Form add references
  4. and selecting Microsoft VBScript Regular Expression 5.5
    ![Excel VBA add regex reference
  5. Insert a new module (code needs to reside in the module otherwise it doesn't work).
    ![Excel VBA insert code module
  6. In the newly inserted module,
    ![Excel VBA insert code into module
  7. add the following code:

    Function RegxFunc(strInput As String, regexPattern As String) As String
        Dim regEx As New RegExp
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .pattern = regexPattern
        End With
    
        If regEx.Test(strInput) Then
            Set matches = regEx.Execute(strInput)
            RegxFunc = matches(0).Value
        Else
            RegxFunc = "not matched"
        End If
    End Function
    
  8. The regex pattern is placed in one of the cells and absolute referencing is used on it. ![Excel regex function in-cell usage Function will be tied to workbook that its created in.
    If there's a need for it to be used in different workbooks, store the function in Personal.XLSB

Community
  • 1
  • 1
SAm
  • 2,154
  • 28
  • 28
  • 3
    Thanks for mentioning it needs to be in Personal.xlsb to be available in all Excel documents you work on. Most (?) other answers don't make that clear. Personal.XLSB would go in the folder (might need to create the folder) C:\Users\user name\AppData\Local\Microsoft\Excel\XLStart folder – Mark Stewart Jun 07 '19 at 14:29
  • I chose this approach. However, there is a problem for me with Office 365. I noticed, if I open the xlsm file the other day, formulas with RegxFunc turn #NAME. Actually, to workaround this, I need to recreate the file. Any suggestions? – HoRn Mar 18 '21 at 11:47
  • @HoRn #Name? You might want to try this so answer, https://stackoverflow.com/a/18841575/1699071. It states that the function name and module name were the same. The fix was to rename either the module name or the function name. Other posts on the same page also might help. – SAm Mar 18 '21 at 15:38
  • 1
    I gave up on trying to get personal.xlsb working. Instead I put this function in my clipboard buffer's permanent collection (arsclip) and will just create a new module whenever I need it. It's laughable how difficult this is for a function that should, by 2021, be native to Excel. PS: Right in the middle of this, Stack asked me to *pay* for pasting. Y'all, it's April 2. ^april\x20?0?1$' fails today. Ya got me. – wistlo Apr 02 '21 at 22:14
  • For some people from Non-English countries this may be interesting: You have to use a semicolon ";" instead of a comma "," in `RegxFunc(B5,$C$2)` – devbf Apr 30 '21 at 09:16
32

Here is my attempt:

Function RegParse(ByVal pattern As String, ByVal html As String)
    Dim regex   As RegExp
    Set regex = New RegExp
    
    With regex
        .IgnoreCase = True  'ignoring cases while regex engine performs the search.
        .pattern = pattern  'declaring regex pattern.
        .Global = False     'restricting regex to find only first match.
        
        If .Test(html) Then         'Testing if the pattern matches or not
            mStr = .Execute(html)(0)        '.Execute(html)(0) will provide the String which matches with Regex
            RegParse = .Replace(mStr, "$1") '.Replace function will replace the String with whatever is in the first set of braces - $1.
        Else
            RegParse = "#N/A"
        End If 
    End With
End Function
Vikas Gautam
  • 1,793
  • 22
  • 21
31

This isn't a direct answer but may provide a more efficient alternative for your consideration. Which is that Google Sheets has several built in Regex Functions these can be very convenient and help circumvent some of the technical procedures in Excel. Obviously there are some advantages to using Excel on your PC but for the large majority of users Google Sheets will offer an identical experience and may offer some benefits in portability and sharing of documents.

They offer

REGEXEXTRACT: Extracts matching substrings according to a regular expression.

REGEXREPLACE: Replaces part of a text string with a different text string using regular expressions.

SUBSTITUTE: Replaces existing text with new text in a string.

REPLACE: Replaces part of a text string with a different text string.

You can type these directly into a cell like so and will produce whatever you'd like

=REGEXMATCH(A2, "[0-9]+")

They also work quite well in combinations with other functions such as IF statements like so:

=IF(REGEXMATCH(E8,"MiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*")/1000,IF(REGEXMATCH(E8,"GiB"),REGEXEXTRACT(E8,"\d*\.\d*|\d*"),"")

enter image description here

Hopefully this provides a simple workaround for those users who feel daunted by the VBS component of Excel.

Brian THOMAS
  • 482
  • 1
  • 5
  • 19
Alex Roseland
  • 461
  • 4
  • 10
  • 6
    Thanks for sharing Alex. This is useful for those looking for Google version. You might consider writing & answering another question specific to Google Sheets Regex as it has it's own nomenclature and would be very useful to others. Regardless, you have my upvote! – Automate This Nov 27 '19 at 00:08
  • Can you create a module that supports google syntax exactly in excel? – Vijay Jul 02 '20 at 10:39
17

To add to the valuable content, I would like to create this reminder on why sometimes RegEx within VBA is not ideal. Not all expressions are supported, but instead may throw an Error 5017 and may leave the author guessing (which I am a victim of myself).

Whilst we can find some sources on what is supported, it would be helpfull to know which metacharacters etc. are not supported. A more in-depth explaination can be found here. Mentioned in this source:

"Although "VBScript’s regular expression ... version 5.5 implements quite a few essential regex features that were missing in previous versions of VBScript. ... JavaScript and VBScript implement Perl-style regular expressions. However, they lack quite a number of advanced features available in Perl and other modern regular expression flavors:"


So, not supported are:

  • Start of String ancor \A, alternatively use the ^ caret to match postion before 1st char in string
  • End of String ancor \Z, alternatively use the $ dollar sign to match postion after last char in string
  • Positive LookBehind, e.g.: (?<=a)b (whilst postive LookAhead is supported)
  • Negative LookBehind, e.g.: (?<!a)b (whilst negative LookAhead is supported)
  • Atomic Grouping
  • Possessive Quantifiers
  • Unicode e.g.: \{uFFFF}
  • Named Capturing Groups. Alternatively use Numbered Capturing Groups
  • Inline modifiers, e.g.: /i (case sensitivity) or /g (global) etc. Set these through the RegExp object properties > RegExp.Global = True and RegExp.IgnoreCase = True if available.
  • Conditionals
  • Regular Expression Comments. Add these with regular ' comments in script

I already hit a wall more than once using regular expressions within VBA. Usually with LookBehind but sometimes I even forget the modifiers. I have not experienced all these above mentioned backdrops myself but thought I would try to be extensive referring to some more in-depth information. Feel free to comment/correct/add. Big shout out to regular-expressions.info for a wealth of information.

P.S. You have mentioned regular VBA methods and functions, and I can confirm they (at least to myself) have been helpful in their own ways where RegEx would fail.

JvdV
  • 70,606
  • 8
  • 39
  • 70
10

I needed to use this as a cell function (like SUM or VLOOKUP) and found that it was easy to:

  1. Make sure you are in a Macro Enabled Excel File (save as xlsm).
  2. Open developer tools Alt + F11
  3. Add Microsoft VBScript Regular Expressions 5.5 as in other answers
  4. Create the following function either in workbook or in its own module:

    Function REGPLACE(myRange As Range, matchPattern As String, outputPattern As String) As Variant
        Dim regex As New VBScript_RegExp_55.RegExp
        Dim strInput As String
    
        strInput = myRange.Value
    
        With regex
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = matchPattern
        End With
    
        REGPLACE = regex.Replace(strInput, outputPattern)
    
    End Function
    
  5. Then you can use in cell with =REGPLACE(B1, "(\w) (\d+)", "$1$2") (ex: "A 243" to "A243")

Neuron
  • 5,141
  • 5
  • 38
  • 59
Andrew Wynham
  • 2,310
  • 21
  • 25
  • This naming of outputPattern threw me off. It's the replacement value. – Thor Jun 01 '17 at 17:47
  • 1
    Yes. I suppose I left it named pattern so it was clear it wasn't just string substitution and you could use regex matching groups like $1 $2 etc. – Andrew Wynham Jun 01 '17 at 17:50
8

Here is a regex_subst() function. Examples:

=regex_subst("watermellon", "[aeiou]", "")
---> wtrmlln
=regex_subst("watermellon", "[^aeiou]", "")
---> aeeo

Here is the simplified code (simpler for me, anyway). I couldn't figure out how to build a suitable output pattern using the above to work like my examples:

Function regex_subst( _
     strInput As String _
   , matchPattern As String _
   , Optional ByVal replacePattern As String = "" _
) As Variant
    Dim inputRegexObj As New VBScript_RegExp_55.RegExp

    With inputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = matchPattern
    End With

    regex_subst = inputRegexObj.Replace(strInput, replacePattern)
End Function
Neuron
  • 5,141
  • 5
  • 38
  • 59
jgreve
  • 1,225
  • 12
  • 17
2

I don't want to have to enable a reference library as I need my scripts to be portable. The Dim foo As New VBScript_RegExp_55.RegExp line caused User Defined Type Not Defined errors, but I found a solution that worked for me.

Update RE comments w/ @chrisneilsen :

I was under the impression that enabling a reference library was tied to the local computers settings, but it is in fact, tied directly to the workbook. So, you can enable a reference library, share a macro enabled workbook and the end user wouldn't have to enable the library as well. Caveat: The advantage to Late Binding is that the developer does not have to worry about the wrong version of an object library being installed on the user's computer. This likely would not be an issue w/ the VBScript_RegExp_55.RegExp library, but I'm not sold that the "performance" benifit is worth it for me at this time, as we are talking imperceptible milliseconds in my code. I felt this deserved an update to help others understand. If you enable the reference library, you can use "early bind", but if you don't, as far as I can tell, the code will work fine, but you need to "late bind" and loose on some performance/debugging features.

Source: https://peltiertech.com/Excel/EarlyLateBinding.html

What you'll want to do is put an example string in cell A1, then test your strPattern. Once that's working adjust then rng as desired.

Public Sub RegExSearch()
'https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops
'https://wellsr.com/vba/2018/excel/vba-regex-regular-expressions-guide/
'https://www.vitoshacademy.com/vba-regex-in-excel/
    Dim regexp As Object
    'Dim regex As New VBScript_RegExp_55.regexp 'Caused "User Defined Type Not Defined" Error
    Dim rng As Range, rcell As Range
    Dim strInput As String, strPattern As String
    
    Set regexp = CreateObject("vbscript.regexp")
    Set rng = ActiveSheet.Range("A1:A1")
        
    strPattern = "([a-z]{2})([0-9]{8})"
    'Search for 2 Letters then 8 Digits Eg: XY12345678 = Matched

    With regexp
        .Global = False
        .MultiLine = False
        .ignoreCase = True
        .Pattern = strPattern
    End With

    For Each rcell In rng.Cells

        If strPattern <> "" Then
            strInput = rcell.Value

            If regexp.test(strInput) Then
                MsgBox rcell & " Matched in Cell " & rcell.Address
            Else
                MsgBox "No Matches!"
            End If
        End If
    Next
End Sub
FreeSoftwareServers
  • 2,271
  • 1
  • 33
  • 57
  • 1
    _I don't want to have to enable a reference library as I need my scripts to be portable_ - the `VBScript_RegExp_55` library is pretty much ubiquitous so carries very low risk of not being on a particular target machine. And anyway, switching from Early Bound to Late Bound doesn't solve the portability problem (the code will still error, only at run time rather than compile time) – chris neilsen Feb 21 '20 at 01:36
  • 2
    Sure, but the RegEx code won't be _usable_. If the RegEx stuff is part of a larger library workbook, I guess having it not throw a compile error therefore allowing other parts to be used could be considered useful – chris neilsen Feb 21 '20 at 03:08
  • 1
    My point was _if_ the code doesn't work early bound (with the required reference in place) then it won't work late bound either. As to the portability point, that has merit where there may be different versions of the same library on different PCs. Hardly an issue here, Regex library hasn't changed in over a decade, and isn't likely to change anytime soon. Anyway, what do you mean by "changes to default setup"? A library reference is part of the Workbook, not the PC setup. – chris neilsen Sep 17 '20 at 02:16
  • I run almost exclusively macros against the `ActiveWorkbook` via `xlam` add-ins and from what I understand, enabling a reference library was tied to the local PC MS Office settings. If, this is somehow tied to the `xlam` that can be shared without users having to do anything on thier end, then perhaps I am mistaken. I understand better now you are saying that Early vs Late has different error handling, but I'm not concerned with that. – FreeSoftwareServers Sep 17 '20 at 02:19
  • Which library did you have that experience with? – chris neilsen Sep 17 '20 at 02:32
  • Are you saying you had that experience with `VBScript_RegExp_55`? – chris neilsen Sep 17 '20 at 02:35
  • 1
    I've been using the VBA Regex library for about 20yrs, and never experienced an error that switching from early to late bound solved. If you can demonstrate one, I'd be interested to see it (would make for a good question in itself). In the meantime I stand by my original comment. – chris neilsen Sep 25 '20 at 01:34
  • @chrisneilsen I updated answer, seems reference libraries are directly tied to workbook, not local computer. I was able to open two workbooks and compare different enabled reference libraries. Took me a while to test, but I appreciate the knowledge moving forward I have gained regarding enable ref libraries. I'm less afraid now, but my macros will/could be used on terminals ranging from Win XP to Office 365 so portability is key, but this library isn't likely going anywhere, – FreeSoftwareServers Oct 20 '20 at 01:45
  • If you look at the object model for a VBA project, you'll find that a `VBProject` object has a `References` property. Library references are saved as part of the VBA project in the host document. – Mathieu Guindon Jan 16 '22 at 15:09