28

It appears that regex (as in regular expressions) is not supported in Excel, except via VBA. Is this so, and if it is, are there any "open source" custom VBA functions that support regex? In this case I'm looking to extract complex pattern within a string, and any implementation of a custom VBA function that expose support of regex within the function itself would be of use. If you know of semi-related function such as the IS function, feel free to comment, though I'm really looking for a full regular expression implementation that is exposed via functions.

Also, just a heads up that I'm using Office 2010 on Windows 7; added this info after an answer that appears to be a great suggestion turned out not to work on Office 2010.

ZygD
  • 22,092
  • 39
  • 79
  • 102
blunders
  • 3,619
  • 10
  • 43
  • 65
  • Another user (@user500414) posted http://ramblings.mcpher.com/Home/excelquirks/regular-expressions as an answer, but it was deleted as a comment, it's not a bad implementation with full code of the general regex VBA interface. – Orbling Aug 19 '13 at 18:59
  • There is also an Add-In, written in C++, so there's a `.xll` file to install, but the code is available: http://xllregex.codeplex.com/ – Orbling Aug 19 '13 at 19:13

6 Answers6

40

Nothing built into Excel. VBScript has built-in support and can be called from VBA. More info available here. You can call the object using late binding in VBA. I've included a few functions that I put together recently. Please note that these are not well-tested and may have some bugs, but they are pretty straightforward.

This should at least get you started:

'---------------------------------------------------------------------------------------vv
' Procedure : RegEx
' Author    : Mike
' Date      : 9/1/2010
' Purpose   : Perform a regular expression search on a string and return the first match
'               or the null string if no matches are found.
' Usage     : If Len(RegEx("\d{1,2}[/-]\d{1,2}[/-]\d{2,4}", txt)) = 0 Then MsgBox "No date in " & txt
'           : TheDate = RegEx("\d{1,2}[/-]\d{1,2}[/-]\d{2,4}", txt)
'           : CUSIP = Regex("[A-Za-z0-9]{8}[0-9]",txt)
'---------------------------------------------------------------------------------------
'^^
Function RegEx(Pattern As String, TextToSearch As String) As String 'vv
    Dim RE As Object, REMatches As Object

    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = False
        .Global = False
        .IgnoreCase = False
        .Pattern = Pattern
    End With

    Set REMatches = RE.Execute(TextToSearch)
    If REMatches.Count > 0 Then
        RegEx = REMatches(0)
    Else
        RegEx = vbNullString
    End If
End Function '^^

'---------------------------------------------------------------------------------------
' Procedure : RegExReplace
' Author    : Mike
' Date      : 11/4/2010
' Purpose   : Attempts to replace text in the TextToSearch with text and back references
'               from the ReplacePattern for any matches found using SearchPattern.
' Notes     - If no matches are found, TextToSearch is returned unaltered.  To get
'               specific info from a string, use RegExExtract instead.
' Usage     : ?RegExReplace("(.*)(\d{3})[\)\s.-](\d{3})[\s.-](\d{4})(.*)", "My phone # is 570.555.1234.", "$1($2)$3-$4$5")
'             My phone # is (570)555-1234.
'---------------------------------------------------------------------------------------
'
Function RegExReplace(SearchPattern As String, TextToSearch As String, ReplacePattern As String, _
                      Optional GlobalReplace As Boolean = True, _
                      Optional IgnoreCase As Boolean = False, _
                      Optional MultiLine As Boolean = False) As String
Dim RE As Object

    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = MultiLine
        .Global = GlobalReplace
        .IgnoreCase = IgnoreCase
        .Pattern = SearchPattern
    End With

    RegExReplace = RE.Replace(TextToSearch, ReplacePattern)
End Function

'---------------------------------------------------------------------------------------
' Procedure : RegExExtract
' Author    : Mike
' Date      : 11/4/2010
' Purpose   : Extracts specific information from a string.  Returns empty string if not found.
' Usage     : ?RegExExtract("(.*)(\d{3})[\)\s.-](\d{3})[\s.-](\d{4})(.*)", "My phone # is 570.555.1234.", "$2$3$4")
'             5705551234
'             ?RegExExtract("(.*)(\d{3})[\)\s.-](\d{3})[\s.-](\d{4})(.*)", "My name is Mike.", "$2$3$4")
'
'             ?RegExReplace("(.*)(\d{3})[\)\s.-](\d{3})[\s.-](\d{4})(.*)", "My name is Mike.", "$2$3$4")
'             My name is Mike.
'---------------------------------------------------------------------------------------
'
Function RegExExtract(SearchPattern As String, TextToSearch As String, PatternToExtract As String, _
                      Optional GlobalReplace As Boolean = True, _
                      Optional IgnoreCase As Boolean = False, _
                      Optional MultiLine As Boolean = False) As String
Dim MatchFound As Boolean

    MatchFound = Len(RegEx(SearchPattern, TextToSearch)) > 0
    If MatchFound Then
        RegExExtract = RegExReplace(SearchPattern, TextToSearch, PatternToExtract, _
                                    GlobalReplace, IgnoreCase, MultiLine)
    Else
        RegExExtract = vbNullString
    End If
End Function
mwolfe02
  • 23,787
  • 9
  • 91
  • 161
  • +1 Great, thanks -- just a quick question before I attempt to use the code (which looks super, thanks) -- what system did you use this on? I'm on Window 7, Office 2010 using 64-bit. Again, thanks! – blunders Dec 29 '10 at 19:30
  • 1
    Windows 7 64-bit, Office XP. The `vbscript.regexp` object ships with windows, not office, so it should work fine for you. – mwolfe02 Dec 30 '10 at 13:46
  • +1 This regular expression engine ships with (some versions of) Internet Explorer, as far as I know. I've used it in my sorting add-in LAselect and it apears to work just fine for sorting street names, human names, postcodes and the like without resorting to splitting cells. – Avan Dec 02 '12 at 13:41
  • 1
    You can also call it using early binding, by adding a reference to the VBScript Regular Expression library. – Zev Spitz Mar 04 '21 at 21:08
  • +1 for `CreateObject("vbscript.regexp")`, which alleviates the need to manually add in support via. `Tools -> References -> Microsoft VBScript Regular Expressions 5.5` – Andrew Jun 02 '23 at 21:57
9

Here's a post regarding Regex usage in Excel:

http://mathfest.blogspot.com/2010/03/regular-expressions-in-excel.html

Hope it helps.

And another which uses Python and IronSpread

http://mathfest.blogspot.ca/2012/06/using-ironspread-and-regular.html

rossisen
  • 3
  • 2
Rion Williams
  • 74,820
  • 37
  • 200
  • 327
  • 1
    +1 Thanks, appears that's only compatible with Excel 95 to 2007 -- I'm using 2010 on Win7; would have thought any solution would work on my platform, but guess not. I'll add that info to my question. – blunders Dec 29 '10 at 19:01
  • 2
    Just an update that it appears the issue is related to 64-bit on Office2010-Win7 http://www.mrexcel.com/forum/showthread.php?t=467397 At this point really don't want to switch to 32-bit Office 2010 just for this, but thought I'd add this info to your answer in case someone else was looking into it. – blunders Dec 29 '10 at 19:27
8

regexp use within functions is included in OpenOffice/LibreOffice Calc. To activate go to Tools > Options > Calc > Calculate: Y = Enable Regular Expressions in Formulas.

ZygD
  • 22,092
  • 39
  • 79
  • 102
Seth
  • 81
  • 1
  • 1
  • 1
    OpenOffice Documentation: [Regular Expressions in Calc](http://wiki.openoffice.org/wiki/Documentation/How_Tos/Regular_Expressions_in_Calc) – piotr_cz May 14 '13 at 13:02
2

I've tried a couple of solutions and given my lack of expertise on VBA I found most of them too cumbersome for me. The easiest one I found was SeoTools for Excel (http://nielsbosma.se/projects/seotools/). Worked like a charm for me.

Paulo S. Abreu
  • 183
  • 1
  • 3
  • 8
0

--- FEB 2014 ---

Just to give an alternative, both Open Office and Libre Office Calc software (their spreadsheet software name) allow for regular expressions in their search functionality.

Robert Brisita
  • 5,461
  • 3
  • 36
  • 35
  • Unclear why your answer is on topic or why "FEB 2014" is in your answer. Please explain, thanks. – blunders Feb 14 '14 at 20:43
  • 2
    As it states in my answer it is an updated alternative rather than the functionality that was stated and accepted four years ago. I bold the month and year for someone who would scroll through the answers and not have to search for how old an answer is. – Robert Brisita Feb 14 '14 at 22:36
-1

I recently had the same exact question and after struggling with creating my own tools and making them work correctly I found a great online ADDIN that's very EASY to USE.

This is the creator's excerpt

For my internship over the past several months I’ve been working in the Marketing Science department and part of my job has been getting data into MS Access and generating reports. This involves getting lists of prospects from various data sources. This was usually a pretty simple feat involving some basic SQL queries. However, sometimes I was handed data such as addresses that didn’t match any standard format used by IT. In the worst case the data was provided in a pdf which meant I could only export it to a non-delimited text file. I found that I really needed a couple generic regular expression functions to parse out fields to import into MS Access. I found some .xla examples online but I really wanted an easier to use, more extensive, and portable library. I also wanted to include a few basic patterns for so it wasn’t necessary to re-invent the wheel every time.

So, I created a simple Excel Add-In Regular Expressions.xla that adds several custom functions to implement the standard VBScript regular expressions.

Here is the website

I've used it successfully to extract useful text using regex.

Here's the code in the addin

' Regular Expressions.xla
'
' ? 2010 Malcolm Poindexter
' This is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License
' as published by the Free Software Foundation, version 3.
' This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
' without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
' See the GNU General Public License for more details. http://www.gnu.org/licenses/gpl.html
'
' I would appreciate if you would notify me of any modifications or re-distributions of this code at contact@malcolmp.com
' and appropriately attribute me as the original author in the header.
' ------------------------------------------------------------------------------------------------------------------------
'
' This file provides an Excel Add-In containing regular expression processing functions and several pre-defined regular expressions.
' The regular expressions provided are not necessarially exhaustive, but are intended to cover the most common cases.
'
' Regular Expressions Syntax: http://msdn.microsoft.com/en-us/library/1400241x%28VS.85%29.aspx

' -----------------------------
' NAME: xREPLACE
' DESCRIPTION: Replace all portions of the search text matching the pattern with the replacement text.
' -----------------------------
Function xREPLACE(pattern As String, searchText As String, replacementText As String, Optional ignoreCase As Boolean = True) As String
    On Error Resume Next
    Dim RegEx As New RegExp
    RegEx.Global = True
    RegEx.MultiLine = True
    RegEx.pattern = pattern
    RegEx.ignoreCase = ignoreCase
    xREPLACE = RegEx.Replace(searchText, replacementText)
End Function

' -----------------------------
' NAME: xMATCHES
' DESCRIPTION: Find and return the number of matches to a pattern in the search text.
' -----------------------------
Function xMATCHES(pattern As String, searchText As String, Optional ignoreCase As Boolean = True) As String
    On Error Resume Next
    Dim RegEx As New RegExp
    RegEx.Global = True
    RegEx.MultiLine = True
    RegEx.pattern = pattern
    RegEx.ignoreCase = ignoreCase
    Dim matches As MatchCollection
    Set matches = RegEx.Execute(searchText)
    xMATCHES = matches.Count
End Function

' -----------------------------
' NAME: xMATCH
' DESCRIPTION: Find and return an instance of a match to the pattern in the search text. MatchIndex may be used in the case of multiple matches.
' -----------------------------
Function xMATCH(pattern As String, searchText As String, Optional matchIndex As Integer = 1, _
                Optional ignoreCase As Boolean = True) As String
    On Error Resume Next
    Dim RegEx As New RegExp
    RegEx.Global = True
    RegEx.MultiLine = True
    RegEx.pattern = pattern
    RegEx.ignoreCase = ignoreCase
    Dim matches As MatchCollection
    Set matches = RegEx.Execute(searchText)
    Dim i As Integer
    i = 1
    For Each Match In matches
        If i = matchIndex Then
            xMATCH = Match.Value
        End If
        i = i + 1
    Next
End Function

' -----------------------------
' NAME: xMATCHALL
' DESCRIPTION: Find and return a comma-separated list of all matches to the pattern in the search text.
' -----------------------------
Function xMATCHALL(pattern As String, searchText As String, Optional ignoreCase As Boolean = True) As String
    On Error Resume Next
    Dim RegEx As New RegExp
    RegEx.Global = True
    RegEx.MultiLine = True
    RegEx.pattern = pattern
    RegEx.ignoreCase = ignoreCase
    Dim matches As MatchCollection
    Set matches = RegEx.Execute(searchText)
    Dim i As Integer
    i = 1
    Dim returnMatches As String
    returnMatches = ""
    For Each Match In matches
        If i = 1 Then
            returnMatches = Match.Value
        Else
            returnMatches = returnMatches + "," + Match.Value
        End If
        i = i + 1
    Next
    xMATCHALL = returnMatches
End Function

' -----------------------------
' NAME: xGROUP
' DESCRIPTION: Find and return a group from within a matched pattern.
' -----------------------------
Function xGROUP(pattern As String, searchText As String, group As Integer, Optional matchIndex As Integer = 1, _
                Optional ignoreCase As Boolean = True) As String
    On Error Resume Next
    If group <> 0 Then
        group = group - 1
    End If
    Dim RegEx As New RegExp
    RegEx.Global = True
    RegEx.MultiLine = True
    RegEx.pattern = pattern
    RegEx.ignoreCase = ignoreCase
    Dim matches As MatchCollection
    Set matches = RegEx.Execute(searchText)
    Dim i As Integer
    i = 1
    For Each Match In matches
        If i = matchIndex Then
            xGROUP = Match.SubMatches(group)
        End If
        i = i + 1
    Next
End Function

' -----------------------------
' NAME: xSTARTSWITH
' DESCRIPTION: Returns true or false if the search text starts with the pattern.
' -----------------------------
Function xSTARTSWITH(pattern As String, searchText As String, Optional ignoreCase As Boolean = True) As String
    On Error Resume Next
    Dim RegEx As New RegExp
    RegEx.Global = True
    RegEx.MultiLine = True
    RegEx.pattern = "^" + pattern
    RegEx.ignoreCase = ignoreCase
    Dim matches As MatchCollection
    Set matches = RegEx.Execute(searchText)
    xSTARTSWITH = matches.Count > 0
End Function

' -----------------------------
' NAME: xENDSWITH
' DESCRIPTION: Returns true or false if the search text ends with the pattern.
' -----------------------------
Function xENDSWITH(pattern As String, searchText As String, Optional ignoreCase As Boolean = True) As String
    On Error Resume Next
    Dim RegEx As New RegExp
    RegEx.Global = True
    RegEx.MultiLine = True
    RegEx.pattern = pattern + "$"
    RegEx.ignoreCase = ignoreCase
    Dim matches As MatchCollection
    Set matches = RegEx.Execute(searchText)
    xENDSWITH = matches.Count > 0
End Function

' ************************************
' Regular Expression Definitions
' ************************************

' -----------------------------
' NAME: xxEMAIL
' DESCRIPTION: Pattern to match an email address.
' -----------------------------
Function xxEMAIL() As String
    xxEMAIL = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"
End Function

' -----------------------------
' NAME: xxUSZIP
' DESCRIPTION: Pattern to match an US Zip code.
' -----------------------------
Function xxUSZIP() As String
    xxUSZIP = "\b(?!0{5})(\d{5})(?!-0{4})(-\d{4})?\b"
End Function

' -----------------------------
' NAME: xxPHONE
' DESCRIPTION: Pattern to match a phone number.
' -----------------------------
Function xxPHONE() As String
    xxPHONE = "\b[01]?[- .]?\(?[2-9]\d{2}\)?\s?[- .]?\s?\d{3}\s?[- .]?\s?\d{4}(\s*(x|(ext))[\.]?\s*\d{1,6})?\b"
End Function

' -----------------------------
' NAME: xxURL
' DESCRIPTION: Pattern to match a url.
' -----------------------------
Function xxURL() As String
    xxURL = "\b((ftp)|(https?))\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}\b"
End Function


' ************************************
'   Insert Function Dialog Category Setup
' ************************************
Sub AddCategoryDescription()
    Application.MacroOptions Macro:="xREPLACE", _
        Description:="Replace all portions of the search text matching the pattern with the replacement text.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xMATCHES", _
        Description:="Find and return the number of matches to a pattern in the search text.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xMATCH", _
        Description:="Find and return an instance of a match to the pattern in the search text. MatchIndex may be used in the case of multiple matches.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xMATCHALL", _
        Description:="Find and return a comma-separated list of all matches to the pattern in the search text.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xGROUP", _
        Description:="Find and return a group from within a matched pattern.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xSTARTSWITH", _
        Description:="Returns true or false if the search text starts with the pattern.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xENDSWITH", _
        Description:="Returns true or false if the search text ends with the pattern.", _
        Category:="Regular Expressions"

    '**** Regular Expressions ****

    Application.MacroOptions Macro:="xxEMAIL", _
        Description:="Pattern to match an email address.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xxUSZIP", _
        Description:="Pattern to match an US Zip code.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xxPHONE", _
        Description:="Pattern to match a phone number.", _
        Category:="Regular Expressions"

    Application.MacroOptions Macro:="xxURL", _
        Description:="Pattern to match a url.", _
        Category:="Regular Expressions"
End Sub
Vijay
  • 891
  • 3
  • 19
  • 35
  • added the code here to make it seem less like an ad, so you can see that it is a useful and quick to use library – Vijay Jun 13 '18 at 11:52