3

I need to tokenize a mathematical expression using VBA. I have a working solution but am looking for a more efficient way of doing it (possibly RegExp).

My current solution:

Function TokeniseTheString(str As String) As String()

Dim Operators() As String
' Array of Operators:
Operators = Split("+,-,/,*,^,<=,>=,<,>,=", ",")

' add special characters around all "(", ")" and ","
str = Replace(str, "(", Chr(1) & "(" & Chr(1))
str = Replace(str, ")", Chr(1) & ")" & Chr(1))
str = Replace(str, ",", Chr(1) & "," & Chr(1))

Dim i As Long
' add special characters around all operators
For i = LBound(Operators) To UBound(Operators)
    str = Replace(str, Operators(i), Chr(1) & Operators(i) & Chr(1))
Next i

' for <= and >=, there will now be two special characters between them instead of being one token
' to change <  = back to <=, for example
For i = LBound(Operators) To UBound(Operators)
    If Len(Operators(i)) = 2 Then
        str = Replace(str, Left(Operators(i), 1) & Chr(1) & Chr(1) & Right(Operators(i), 1), Operators(i))
    End If
Next i

' if there was a "(", ")", "," or operator next to each other, there will be two special characters next to each other
Do While InStr(str, Chr(1) & Chr(1)) > 0
    str = Replace(str, Chr(1) & Chr(1), Chr(1))
Loop
' Remove special character at the end of the string:
If Right(str, 1) = Chr(1) Then str = Left(str, Len(str) - 1)

TokeniseTheString = Split(str, Chr(1))

End Function

Test using this string IF(TestValue>=0,TestValue,-TestValue) gives me the desired solution.

Sub test()
Dim TokenArray() As String
TokenArray = TokeniseTheString("IF(TestValue>=0,TestValue,-TestValue)")
End Sub

I have never seen regular expressions before and tried to implement this into VBA. The problem I am having is that the RegExp object in VBA doesn't allow positive lookbehind.

I will appreciate any more efficient solution than mine above.

Community
  • 1
  • 1
Malan Kriel
  • 301
  • 1
  • 2
  • 17
  • 1
    As a first simple step without RegEx you could restrict the second loop matching ">=" and "<=" to `For i = 5 To 6` `str = Replace(str, Left(Operators(i), 1) & Chr(1) & Chr(1) & Right(Operators(i), 1), Operators(i))` `Next i` without checking for length as 0-based Indices 5 and 6 are of length 2. RegEx can be fast, but not in any case; you could consider to check the string in binary mode, too. Further idea: you could build a hierarchy of Tokens and operators using XMLDom :-) ` – T.M. Jun 29 '19 at 07:36
  • 1
    A `RegEx` will simplify the code, but converting the string to a buffer `Dim buffer() As Byte: buffer = str` and looping around each character with a `Select Case` would be less expensive. If you want to try `RegEx`, then use the pattern `"(""(?:""""|[^""])*""|[^ ()+-/\*^<>=,]+|<=|>=|\S)\s*"` and replace/split the matches: `tokens = Split(re.Replace(str, "$1" & ChrW(-1)), ChrW(-1))`. – Florent B. Jun 29 '19 at 11:42
  • Why do you think you need a positive lookbehind? What expression did you use? – Wiktor Stribiżew Jun 29 '19 at 15:28
  • Thank you @FlorentB. I have posted the function below that works the same as my original solution. Do you mind explaining the different parts of the pattern? – Malan Kriel Jun 29 '19 at 16:45

1 Answers1

1

As suggested by @Florent B, the following function gives the same results using RegExp:

Function TokenRegex(str As String) As String()
Dim objRegEx As New RegExp
Dim strPattern As String

strPattern = "(""(?:""""|[^""])*""|[^\s()+\-\/*^<>=,]+|<=|>=|\S)\s*"
With objRegEx
    .Global = True
    .MultiLine = False
    .IgnoreCase = True
    .Pattern = strPattern
End With

str = objRegEx.Replace(str, "$1" & ChrW(-1))
If Right(str, 1) = ChrW(-1) Then str = Left(str, Len(str) - 1)
TokenRegex = Split(str, ChrW(-1))

End Function
Florent B.
  • 41,537
  • 7
  • 86
  • 101
Malan Kriel
  • 301
  • 1
  • 2
  • 17