0

Having read this very useful post on how to use RegEx within MS Excel, I am now stuck in generating the required expression to cover following scenarios - Any help appreciated.

Objective: I want to be able to split strings like the ones below:

Example 1:

AB12345|AB56789x89402 ---> AB12345 AB56789 89402 (each of the three to be stored in different cell and the total number of sub-strings is not known in advance, could be from 0 to 10)

Example 2:

#AB03925# to be stored as AB03925

Example 3:

(ABC-SR-XYZ)|(ABC-XYZ) to be stored as ABCXYZ

Is it possible to have all of the above examples handled through a single RegEx?

So far I have created the following which partially handles cases like example 1:

    strPattern = "(^[A-Z][A-Z][0-9]{5})([|]*[A-Z][A-Z][0-9]{5})"

    If strPattern <> "" Then
        strInput = C.Value

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.Test(strInput) Then
            C.Offset(0, 1) = regEx.Replace(strInput, "$1")
            C.Offset(0, 2) = regEx.Replace(strInput, "$2")
        End If
    End If

However I still do not know how to get rid of the pipe (|) when printing those two strings separately.

Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
Icarus
  • 139
  • 2
  • 13
  • 1
    Perhaps match all the values and replace hyphens with an empty string `(?:[A-Z][A-Z])?\d+|[A-Z]{3}-[A-Z]{3}` [Test](https://regex101.com/r/3vOV2g/1/) – The fourth bird Feb 04 '19 at 07:21
  • 1
    Why should `(ABC-SR-XYZ)|(ABC-XYZ)` be stored as `ABCXYZ` and not as `ABCSRXYZ` in one cell and `ABCXYZ` in another cell? To write a regex, we need some uniformity in rules – Gurmanjot Singh Feb 04 '19 at 07:24
  • To find a solid RegEx in most cases one example of each type is not enough. Can you provide a real example data set? – Pᴇʜ Feb 04 '19 at 08:03
  • Probably something like https://regex101.com/r/cfMoAQ/1, and then check which `SubMatches` matched, and post-process the value as required. You can't just throw in a regex and get your results anyway. – Wiktor Stribiżew Feb 04 '19 at 11:05

2 Answers2

1

In case you are interested, here is a non-regex solution.

Option Explicit

Private Function ConvertCustomText(InputString As String) As Variant
    Dim CharactersToReplace As Object: Set CharactersToReplace = CreateObject("Scripting.Dictionary")
    Dim Character           As Variant
    Dim HasDash             As Boolean

    With CharactersToReplace
        .Add "#", vbNullString
        .Add "(", vbNullString
        .Add ")", vbNullString
        .Add "-", vbNullString
        .Add "x", "|"
    End With

    HasDash = IIf(InStr(1, InputString, "-", vbBinaryCompare) > 0, True, False)

    For Each Character In CharactersToReplace
        InputString = Replace(InputString, Character, CharactersToReplace(Character))
    Next

    ConvertCustomText = IIf(HasDash, _
                            Split(InputString, "|")(UBound(Split(InputString, "|"))), _
                            Split(InputString, "|"))
End Function

Public Sub Example()
    PrintArrayOrText (ConvertCustomText("AB12345|AB56789x89402"))
    PrintArrayOrText (ConvertCustomText("#AB03925#"))
    PrintArrayOrText (ConvertCustomText("(ABC-SR-XYZ)|(ABC-XYZ)"))
End Sub

Private Sub PrintArrayOrText(VariantInput As Variant)
    Dim i As Long

    If TypeName(VariantInput) = "String" Then
        Debug.Print "Output is " & VariantInput
    Else
        For i = LBound(VariantInput) To UBound(VariantInput)
            Debug.Print "Output is " & VariantInput(i)
        Next
    End If

End Sub

It returns:

Output is AB12345
Output is AB56789
Output is 89402
Output is AB03925
Output is ABCXYZ
Ryan Wildry
  • 5,612
  • 1
  • 15
  • 35
0

For example 1 & 2 this regex works
[A-Z]{2}\d{5}|\d{5} The proof in regexstorm.
enter image description here

To get different results, you can use across columns, using the column to set the result index.

If regEx.Test(strInput) Then
    Set matches = regEx.Execute(strInput)
    Dim resultSetIndex
    resultSetIndex = Application.Caller.Column - x ' x is your first column

    If resultSetIndex < matches.Count Then
        RegxFunc = matches(resultSetIndex).Value
    Else
        RegxFunc = ""
    End If
    Else
        RegxFunc = "not matched"
End If

Application.Caller.Column is used to determine the result index.
You need to set x where you place the column. In below example, x is 7.
column to result set
For example 3 you need to elaborate more on the rule(s) for extraction.
It isn't entirely clear at this point.

SAm
  • 2,154
  • 28
  • 28
  • Thanks, however I am getting an error with Application.Caller.Column and also not sure whether by "x" you mean I should define a variable or just replace x with column 2 in the code itself. – Icarus Feb 11 '19 at 12:35
  • replce "x" with the 1st column integer. In the above example, its 7. – SAm Feb 12 '19 at 04:43