0

I'm learning about regular expressions and want to write a templating engine in PHP.

Consider the following "template":

<!DOCTYPE html>
<html lang="{{print("{hey}")}}" dir="{{$dir}}">
<head>
    <meta charset="{{$charset}}">
</head>
<body>
    {{$body}}
    {{}}
</body>
</html>

I managed to create a regex that will find anything except for {{}}.

Here's my regex:

{{[^}]+([^{])*}}

There's just one problem. How do I allow the literal { and } to be used within {{}} tags?

It will not find {{print("{hey}")}}.

Thanks in advance.

Erman Belegu
  • 4,074
  • 25
  • 39
Petter Thowsen
  • 1,697
  • 1
  • 19
  • 24
  • Just by way of introduction to HTML and Regex: http://stackoverflow.com/a/1732454/758446 – BlackVegetable Aug 02 '13 at 23:28
  • You could use a generic placeholder instead of the negated charclass for exluding `{` and `}` within. – mario Aug 02 '13 at 23:28
  • As we all know, regular expressions are extremely slow, especially when it comes for large strings. In your case, you can simply stick with `str_replace()` and that would be great enough for you. – Yang Aug 03 '13 at 01:51

4 Answers4

2

You can just use "." instead of the character classes. But you then have to make use of non-greedy quantifiers:

\{\{(.+?)\}\}

The quantifier "+?" means it will consume the least necessary number of characters.

Consider this example:

<table>
  <tr>
    <td>{{print("{first name}")}}</td><td>{{print("{last name}")}}</td>
  </tr>
</table>

With a greedy quantifier (+ or *), you'd only get one result, because it sees the first {{ and then the .+ consumes as many characters as it can as long as the pattern is matched:

{{print("{first name}")}}</td><td>{{print("{last name}")}}

With a non-greedy one (+? or *?) you'll get the two as separate results:

{{print("{first name}")}}
{{print("{last name}")}}
Pik'
  • 6,819
  • 1
  • 28
  • 24
  • I tried it, it works. But what if I have the following in the template? It won't find it correctly. – Petter Thowsen Aug 02 '13 at 23:52
  • In that case, regexp probably aren't the way to go. When it comes to recursive patterns it's faster and clearer to code it by hand. You would go through each character in order. When you encounter `{{`, you start "recording" the next characters, and you use an "open parenthesis" counter. The counter increments for each `(`, decrements for each `)`. When you encounter a `}}` *and* the counter is zero, you're done: you have your template marker. – Pik' Aug 03 '13 at 00:21
  • Hm. that's smart! maybe I'll try that. – Petter Thowsen Aug 03 '13 at 04:10
2

This is a pattern to match the content inside double curly brackets:

$pattern = <<<'LOD'
~
(?(DEFINE)
    (?<quoted>
        ' (?: [^'\\]+ | (?:\\.)+ )++ ' |
        " (?: [^"\\]+ | (?:\\.)+ )++ "
    )
    (?<nested>
        { (?: [^"'{}]+ | \g<quoted> | \g<nested> )*+ }
    )
)

{{
    (?<content>
        (?: 
            [^"'{}]+
          | \g<quoted>  
          | \g<nested>

        )*+
    )
}}
~xs
LOD;

Compact version:

$pattern = '~{{((?>[^"\'{}]+|((["\'])(?:[^"\'\\\]+|(?:\\.)+|(?:(?!\3)["\'])+)++\3)|({(?:[^"\'{}]+|\g<2>|(?4))*+}))*+)}}~s';

The content is in the first capturing group, but you can use the named capture 'content' with the detailed version.

If this pattern is longer, it allows all that you want inside quoted parts including escaped quotes, and is faster than a simple lazy quantifier in much cases. Nested curly brackets are allowed too, you can write {{ doThat(){ doThis(){ }}}} without problems.

The subpattern for quotes can be written like this too, avoiding to repeat the same thing for single and double quotes (I use it in compact version)

(["'])             # the quote type is captured (single or double)
(?:                # open a group (for the various alternatives)
    [^"'\\]+       # all characters that are not a quote or a backslash
  |                # OR
    (?:\\.)+       # escaped characters (with the \s modifier)
  |                #
    (?!\g{-1})["'] # a quote that is not the captured quote
)++                # repeat one or more times
\g{-1}             # the captured quote (-1 refers to the last capturing group)

Notice: a backslash must be written \\ in nowdoc syntax but \\\ or \\\\ inside single quotes.

Explanations for the detailed pattern:

The pattern is divided in two parts:

  • the definitions where i define named subpatterns
  • the whole pattern itself

The definition section is useful to avoid to repeat always the same subpattern several times in the main pattern or to make it more clear. You can define subpatterns that you will use later in this space:
(?(DEFINE)....)

This section contains 2 named subpatterns:

  • quoted : that contains the description of quoted parts
  • nested : that describes nested curly brackets parts

detail of nested

(?<nested>           # open the named group "nested"
    {                # literal {
 ## what can contain curly brackets? ##
    (?>              # open an atomic* group
        [^"'{}]+     # all characters one or more times, except "'{}
      |              # OR
        \g<quoted>   # quoted content, to avoid curly brackets inside quoted parts
                     # (I call the subpattern I have defined before, instead of rewrite all)
      | \g<nested>   # OR curly parts. This is a recursion
    )*+              # repeat the atomic group zero or more times (possessive *)
    }                # literal }
)                    # close the named group

(* more informations about atomic groups and possessive quantifiers)

But all of this are only definitions, the pattern begins really with: {{ Then I open a named capture group (content) and I describe what can be found inside, (nothing new here).

I use to modifiers, x and s. x activates the verbose mode that allows to put freely spaces in the pattern (useful to indent). s is the singleline mode. In this mode, the dot can match newlines (it can't by default). I use this mode because there is a dot in the subpattern quoted.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

Make you regex less greedy using {{(.*?)}}.

Bart
  • 17,070
  • 5
  • 61
  • 80
0

I figured it out. Don't ask me how.

{{[^{}]*("[^"]*"\))?(}})

This will match pretty much anything.. like for example:

{{print("{{}}}{{{}}}}{}}{}{hey}}{}}}{}7")}}
Petter Thowsen
  • 1,697
  • 1
  • 19
  • 24