1

I have an input like:

apple, orange, (pear, banana, grape), mango 

that I want to split as:

apple
orange
(pear, banana, grape)
mango

I don't understand the regex fully, but I think I would use ,(?![^()]*)) which I found here - Java split string on comma(,) except when between parenthesis ()

I am using VBA, so if I have the input stored in an array, I would typically do:

array = Split(string, ",")

But this would yield the result as:

apple
orange
(pear
banana
grape)
mango

which I don't want.

I'm pretty sure I can find a way to replace ( and ) with nothing so they disappear from the output, but I don't know how to feed my regex string logic to my VBA formula.

I thought something like this would work:

array = Split(string, ",(?![^()]*\))")

But it doesn't. I did enable the "Microsoft VBScript Regular Expressions 5.5" reference but it didn't seem to help.

Any advice is appreciated.

Thank you,

BigBen
  • 46,229
  • 7
  • 24
  • 40
Kamui
  • 719
  • 1
  • 9
  • 16
  • 1
    See: https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops – Scott Craner Oct 16 '19 at 20:24

2 Answers2

1

Alternative to RegEx:

Sub mytry()
    Dim str As String
    str = "apple, orange, (pear, banana, grape), mango "

    Dim perenSplt() As String
    perenSplt = Split(Replace(str, ")", ")("), "(")

    str = ""

    Dim i As Long
    For i = LBound(perenSplt) To UBound(perenSplt)
        If InStr(perenSplt(i), ")") Then
            perenSplt(i) = "(" & Replace(perenSplt(i), ",", "|")
        End If
        str = str & perenSplt(i)
    Next i

    Dim finalSplt() As String
    finalSplt = Split(str, ",")

    For i = LBound(finalSplt) To UBound(finalSplt)
        If InStr(str, "(") > 0 Then
            finalSplt(i) = Trim(Replace(finalSplt(i), "|", ","))
        Else
            finalSplt(i) = Trim(finalSplt(i))
        End If
    Next i

    ActiveSheet.Range("A1").Resize(,UBound(finalSplt) + 1) = finalSplt

End Sub

enter image description here

Scott Craner
  • 148,073
  • 10
  • 49
  • 81
  • Can you explain how this works? I don't like copy/pasting code without understanding it better. It looks like you try to change ) into )( and then split the string by ( so it ends up as: "apple, orange, ", "pear, banana, grape)" and ", mango" Then you check if ) exists in the result and if so replace commas with | Then you later replace | with comas again...? – Kamui Oct 16 '19 at 21:02
  • Yes, that is exactly how it works. After replacing the `,` with `|` I split on the `,` and since not `,` exist at that time it does not split the words inside the `()`. Then as a last step I remove any extraneous spaces from the output. – Scott Craner Oct 16 '19 at 21:13
  • Without being able to share my spreadsheet, I was able to incorporate your logic in my code and got it to work for the most part. I am looping over each row in a given range. And for each row, I try to access finalSplt(0), finalSplt(1), and finalSplt(2) unique to that row to access first, second, and third elements in the array. – Kamui Oct 16 '19 at 21:47
  • finalSplt(0) seems to work consistently, but when I try to access finalSplt(1) or finalSplt(2) and add it to the value of a new cell, only some of the elements get written. e.g. if finalSplt(0) = apple, finalSplt(1) = (pear, banana, orange, grape), and finalSplt(2) = mango, I sometimes only get the first two results. Eventually I get a subscript out of range error. Are there any limitations to using an approach that would cause this, or is it likely something else with my code? – Kamui Oct 16 '19 at 21:50
  • It means you have fewer "words" in the split array. If the string is only `Apple, (Pear, Orange, Banana)` there are only `finalSplt(0)` and `finalSplt(1)`. `finalSplt(2)` would error as there is no such item for that line. – Scott Craner Oct 16 '19 at 21:52
  • @Kamui see edit. I have changed the last line to output the result in a column. You only need to change the `"A1"` to the first cell in each row where you wan the output. Then it is not reliant on the number of items but is dynamic. – Scott Craner Oct 16 '19 at 21:55
  • Yes that makes sense. Not all rows will have three elements. Is there a check I can add to ensure the code doesn't get interrupted if I can't fill all elements in my array? – Kamui Oct 16 '19 at 21:57
  • At this point you need to ask a new question showing the part of the code that is in error with desired outcome and inputs. – Scott Craner Oct 16 '19 at 21:58
  • Ok if I can't figure this separate part out I'll add a new thread. Thank you for your help! – Kamui Oct 16 '19 at 21:59
1

another possibility out of RegEx:

Function GetArray(ByVal str As String) As Variant       
    Dim s As String, v As Variant
    For Each v In Split(Replace(str & " ", ")", "("), "(")
        s = s & IIf(Right(v, 1) <> " ", "(" & v & ")", Replace(v, ",", "|"))
    Next

    GetArray = Split(Replace(WorksheetFunction.Trim(s), "| ", "|"), "|")
End Function

which you can use in your main code like:

    Dim myArray As Variant        
    myArray = GetArray("apple, orange, (pear, banana, grape), mango")
    Range("A1").Resize(ubound(myArray) + 1).Value = Application.Transpose(myArray)
DisplayName
  • 13,283
  • 2
  • 11
  • 19