3

I'm trying to create a pattern for finding placeholders within a string to be able to replace them with variables later. I'm stuck on a problem to find all these placeholders within a string according to my requirement.

I already found this post, but it only helped a little: Regex match ; but not \;

Placeholders will look like this

{&var} --> Variable stored in a dictionary --> dict("var")
{$prop} --> Property of a class cls.prop read by CallByName and PropGet
{#const} --> Some constant values by name from a function

Generally I have this pattern and it works well

    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    RegEx.pattern = "\{([#\$&])([\w\.]+)\}"

For example I have this string: "Value of foo is '{&var}' and bar is '{$prop}'" I get 2 matches as expected

  1. (&)(var)
  2. ($)(prop)

I also want to add a formating part like in .Net to this expression.

    String.Format("This is a date: {0:dd.mm.yyyy}", DateTime.Now());
    // This is a date: 05.07.2019
    String.Format("This is a date, too: {0:dd.(mm).yyyy}", DateTime.Now());
    // This is a date, too: 05.(07).2019

I extended the RegEx to get that optional formatting string

    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    RegEx.pattern = "\{([#\$&])([\w\.]+):{0,1}([^\}]*)\}"
    RegEx.Execute("Value of foo is '{&var:DD.MM.YYYY}' and bar is '{$prop}'")

I get 2 matches as expected

  1. (&)(var)(DD.MM.YYYY)
  2. ($)(prop)()

At this point I noticed I have to take care for escapet "{" and "}", because maybe I want to have some brackets within the formattet result.

This does not work properly, because my pattern stops after "...{MM"

RegEx.Execute("Value of foo is '{&var:DD.{MM}.YYYY}' and bar is '{$prop}'")

It would be okay to add escape signs to the text before checking the regex:

RegEx.Execute("Value of foo is '{&var:DD.\{MM\}.YYYY}' and bar is '{$prop}'")

But how can I correctly add the negative lookbehind?

And second: How does this also works for variables, that should not be resolved, even if they have the correct syntax bus the outer bracket is escaped?

RegEx.Execute("This should not match '\{&var:DD.\{MM\}.YYYY\}' but this one '{&var:DD.\{MM\}.YYYY}'")

I hope my question is not confusing and someone can help me

Update 05.07.19 at 12:50 After the great help of @wiktor-stribiżew the result is completed.

As requested i provide some example code:

    Sub testRegEx()
        Debug.Print FillVariablesInText(Nothing, "Date\\\\{$var01:DD.\{MM\}.YYYY}\\\\ Var:\{$nomatch\}{$var02} Double: {#const}{$var01} rest of string")
    End Sub

    Function FillVariablesInText(ByRef dict As Dictionary, ByVal txt As String) As String
        Const c_varPattern As String = "(?:(?:^|[^\\\n])(?:\\{2})*)\{([#&\$])([\w.]+)(?:\:([^}\\]*(?:\\.[^\}\\]*)*))?(?=\})"

        Dim part As String
        Dim snippets As New Collection
        Dim allMatches, m
        Dim i As Long, j  As Long, x As Long, n As Long

        ' Create a RegEx object and execute pattern
        Dim RegEx As Object
        Set RegEx = CreateObject("VBScript.RegExp")
        RegEx.pattern = c_varPattern
        RegEx.MultiLine = True
        RegEx.Global = True
        Set allMatches = RegEx.Execute(txt)

        ' Start at position 1 of txt
        j = 1
        n = 0
        For Each m In allMatches
            n = n + 1
            Debug.Print "(" & n & "):" & m.value
            Debug.Print "    [0] = " & m.SubMatches(0) ' Type [&$#]
            Debug.Print "    [1] = " & m.SubMatches(1) ' Name
            Debug.Print "    [2] = " & m.SubMatches(2) ' Format
            part = "{" & m.SubMatches(0)
            ' Get offset for pre-match-string
            x = 1 ' Index to Postion at least +1
            Do While Mid(m.value, x, 2) <> part
                x = x + 1
            Loop
            ' Postition in txt
            i = m.FirstIndex + x
            ' Anything to add to result?
            If i <> j Then
                snippets.Add Mid(txt, j, i - j)
            End If
            ' Next start postition (not Index!) + 1 for lookahead-positive "}"
            j = m.FirstIndex + m.Length + 2

            ' Here comes a function get a actual value
            ' e.g.: snippets.Add dict(m.SubMatches(1))
            ' or  : snippets.Add Format(dict(m.SubMatches(1)), m.SubMatches(2))
            snippets.Add "<<" & m.SubMatches(0) & m.SubMatches(1) & ">>"
        Next m
        ' Any text at the end?
        If j < Len(txt) Then
            snippets.Add Mid(txt, j)
        End If

        ' Join snippets
        For i = 1 To snippets.Count
            FillVariablesInText = FillVariablesInText & snippets(i)
        Next
    End Function

The function testRegEx gives me this result and debug print:

(1):e\\\\{$var01:DD.\{MM\}.YYYY(2):}{$var02
    [0] = $
    [1] = var02
    [2] = 
(1):e\\\\{$var01:DD.\{MM\}.YYYY
    [0] = $
    [1] = var01
    [2] = DD.\{MM\}.YYYY
(2):}{$var02
    [0] = $
    [1] = var02
    [2] = 
(3): {#const
    [0] = #
    [1] = const
    [2] = 
(4):}{$var01
    [0] = $
    [1] = var01
    [2] = 
Date\\\\<<$var01>>\\\\ Var:\{$nomatch\}<<$var02>> Double: <<#const>><<$var01>> rest of string
Screem174
  • 35
  • 4

2 Answers2

2

You may use

((?:^|[^\\])(?:\\{2})*)\{([#$&])([\w.]+)(?::([^}\\]*(?:\\.[^}\\]*)*))?}

To make sure the consecutive matches are found, too, turn the last } into a lookahead, and when extracting matches just append it to the result, or if you need the indices increment the match length by 1:

((?:^|[^\\])(?:\\{2})*)\{([#$&])([\w.]+)(?::([^}\\]*(?:\\.[^}\\]*)*))?(?=})
                                                                      ^^^^^

See the regex demo and regex demo #2.

Details

  • ((?:^|[^\\])(?:\\{2})*) - Group 1 (makes sure the { that comes next is not escaped): start of string or any char but \ followed with 0 or more double backslashes
  • \{ - a { char
  • ([#$&]) - Group 2: any of the three chars
  • ([\w.]+) - Group 3: 1 or more word or dot chars
  • (?::([^}\\]*(?:\\.[^}\\]*)*))? - an optional sequence of : and then Group 4:
    • [^}\\]* - 0 or more chars other than } and \
    • (?:\\.[^}\\]*)* - zero or more reptitions of a \-escaped char and then 0 or more chars other than } and \
  • } - a } char
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This one works very nice with everything I have tested so far. Thank you very mutch. I have only one Addition, because I will not need the first group to be captured: `(?:(?:^|[^\\\n])(?:\\{2})*)\{([#$&])([\w.]+)(?::([^}\\]*(?:\\.[^}\\]*)*))?}` – Screem174 Jul 04 '19 at 23:00
  • @Screem174 Well, it is up to you to adjust the groups, you posted no code, nor how exactly you use the match objects. – Wiktor Stribiżew Jul 04 '19 at 23:14
  • Sorry for the little abckground about the question. While writing i thought the text is already pretty long leading to the problem. I will take care of this for my next posts. I still found a little problem. two following placeholders are not recognized properly. I think this is because the the pattern wants a begin of line (^) oder \n before the first sign that is checked and this is included in the match. Therfore this is impossible to match: `Start {$var01}{$var02} End` – Screem174 Jul 05 '19 at 00:09
  • 1
    this last regex works like expected. It also matches a badly formatted Placeholder, but this is a convention the user has to take care of. `((?:^|[^\\\n])(?:\\{2})*)\{([#$&])([\w.]+)(?::([^}\\]*(?:\\.[^}\\]*)*))?(?=})` With this I will only have to increse the length of the match by 1 for my code, which is fine :) – Screem174 Jul 05 '19 at 11:13
0

Welcome to the site! If you need to only match balanced escapes, you will need something more powerful. If not --- I haven't tested this, but you could try replacing [^\}]* with [^\{\}]|\\\{|\\\}. That is, match non-braces and escaped brace sequences separately. You may need to change this depending on how you want to handle backslashes in your formatting string.

cxw
  • 16,685
  • 2
  • 45
  • 81
  • Hi, thanks for the suggestion. the balancing of escapes is not interesting for me. I yust wanted to get escaped "}" into my group for the formatting part and ignore the entire Grouß, if it is escaped at the beginning or end. – Screem174 Jul 04 '19 at 23:04