1

I have a string that I need to split into an array. Mostly, the different parts are separated by a . (Dot), but sometimes, the string may contain a part that contains curly brackets { } and any dots inside the curly brackets should not be interpreted as a split character.

I have built the code below to do this, but was wondering if there is a more elegant solution (Such as regular expressions)

Pov = UCase(Trim(Pov))

'Loop through the Pov and escape any dots inside curly brackets
Level = 0
Escaped = ""
For Pos = 1 To Len(Pov)
    PosChar = Mid(Pov, Pos, 1)
    If PosChar = "{" Then
        Level = Level + 1
        Escaped = Escaped & PosChar
    ElseIf PosChar = "}" Then
        Level = Level - 1
        Escaped = Escaped & PosChar
    ElseIf PosChar = "." Then
        If Level > 0 Then
            Escaped = Escaped & "^^^ This is a nested dot ^^^"
        Else
            Escaped = Escaped & PosChar
        End If
    Else
        Escaped = Escaped & PosChar
    End If
Next

'Split the Pov and replace any nested dots
PovSplit = Split(Pov, ".")
For Part = LBound(PovSplit) To UBound(PovSplit)
    PovSplit(Part) = Replace(PovSplit(Part), "^^^ This is a nested dot ^^^", ".")
Next
neelsg
  • 4,802
  • 5
  • 34
  • 58

1 Answers1

1

No, it can not be done with regular expressions "directly". And here you can read why.

Anyway, for a solution using regular expressions (a lot of code, but depending of your data length it can be faster or not, you will need to try)

Dim dicEncode
    set dicEncode = WScript.CreateObject("Scripting.Dictionary")

Dim encodeRE
    Set encodeRE = New RegExp
    With encodeRE
        .Pattern = "\{[^{}]*\}"
        .Global = True
        .IgnoreCase = True
    End With

Dim decodeRE
    Set decodeRE = New RegExp
    With decodeRE
        .Pattern = "\x00(K[0-9]+)\x00"
        .Global = True
        .IgnoreCase = True
    End With

Function encodeFunction(matchString, position, fullString)
    Dim key
        key = "K" & CStr(dicEncode.Count)
    dicEncode.Add key , matchString
    encodeFunction = Chr(0) & key & Chr(0)
End Function 

Function decodeFunction(matchString, key, position, fullString)
    decodeFunction = dicEncode.Item(key)
End Function


Dim originalString    
    originalString = "{abc.def{gh.ijk}l.m}n.o.p{q.r{s{t{u.v}}}w}.x"

Dim encodedString, workBuffer

    encodedString = originalString
    Do
        workBuffer = encodedString
        encodedString = encodeRE.Replace(encodedString,GetRef("encodeFunction"))
    Loop While encodedString <> workBuffer

    encodedString = Replace(encodedString, ".", Chr(0))

    Do 
        workBuffer = encodedString
        encodedString = decodeRE.Replace(encodedString,GetRef("decodeFunction"))
    Loop While encodedString <> workBuffer

Dim aElements, element
    aElements = Split(encodedString, Chr(0))

    WScript.Echo originalString

    For Each element In aElements
        WScript.Echo element
    Next 

All this code just uses regular expressions to find the pairing curly brackets in the string, replacing them and its enclosed data with a key indicator that is stored in a dictionary. When all the "enclosed" data is removed from the string, the remaining dots (your split points) are replaced with a new character (that will later be used to split the string) and then the string is reconstructed. All the "enclosed" dots has been protected and the split can be done over the string using the new character ( Chr(0) in the code).

It is similar to the dictionary creation of a statistical compressor but without any statistics and no compression, of course.

But only useful with long strings. If not, your original approach is way better.

EDITED to adapt to comments

For a better performing code, based on the OP original approach. No exotic regular expresions. Just reduced string concatenation and unnecesary checks removed.

Function mySplit(originalString)
Dim changedString, currentPoint, currentChar, stringEnd, level

    changedString = originalString
    stringEnd = Len(originalString)

    level = 0
    For currentPoint = 1 To stringEnd
        currentChar = Mid(originalString, currentPoint, 1)
        If currentChar = "{" Then 
            level = level + 1
        ElseIf currentChar = "}" Then
            If level > 0 Then 
                level = level - 1
            End If
        ElseIf level = 0 Then 
            If currentChar = "." Then 
                changedString = Left(changedString,currentPoint-1) & Chr(0) & Right(changedString,stringEnd-currentPoint)
            End If
        End If
    Next 

    mySplit = split( changedString, Chr(0) )
End Function 
Community
  • 1
  • 1
MC ND
  • 69,615
  • 8
  • 84
  • 126
  • Thanks for this answer. We are only talking about 80 or so characters per string, but running this algorithm for different strings thousands of times per minute (even per second), so efficiency is still important (It basically needs to do this every time it reads/writes something to an OLAP database with lots of calcs and lots of data) – neelsg May 21 '14 at 14:42
  • @neelsg, i have included a stripped version of your code. Under the indicated scenario this will perform far better. – MC ND May 21 '14 at 17:58