0

I'm trying to write a PHP template engine.

Consider the following string:

@foreach($people as $person)
    <p></p>
$end

I am able to use the following regex to find it:

@[\w]*\(.*?\).*?@end

But if I have this string:

@cake()
    @cake()
        @fish()
        @end
    @end
@end

The regex fails, this is what it finds:

@cake()
    @cake()
        @fish()
        @end

Thanks in advance.

Petter Thowsen
  • 1,697
  • 1
  • 19
  • 24

2 Answers2

2

You can match nested functions, example:

$pattern = '~(@(?<func>\w++)\((?<param>[^)]*+)\)(?<content>(?>[^@]++|(?-4))*)@end)~';

or without named captures:

$pattern = '~(@(\w++)\(([^)]*+)\)((?>[^@]++|(?-4))*)@end)~';

Note that you can have all the content of all nested functions, if you put the whole pattern in a lookahead (?=...)

pattern details:

~                # pattern delimiter
(                # open the first capturing group
    @(\w++)      # function name in the second capturing group
    \(           # literal (
    ([^)]*+)     # param in the third capturing group
    \)           # literal )
    (            # open the fourth capturing group
    (?>          # open an atomic group
        [^@]++   # all characters but @ one or more times
      |          # OR
        (?-4)    # the first capturing group (the fourth on the left, from the current position)
    )*           # close the atomic group, repeat zero or more times
    )            # close the fourth capturing group 
    @end        
)~               # close the first capturing group, end delimiter
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Wow, awesome! thanks :) I should learn how that works. Do you know any resources like books or videos for these advanced regex's ? (to me they seem fairly advanced anyway) – Petter Thowsen Aug 03 '13 at 08:52
  • @PetterThowsen: you can find more informations about recursive regex here: http://www.rexegg.com/regex-recursion.html – Casimir et Hippolyte Aug 03 '13 at 13:31
0

You have nesting, which takes you out of the realm of a regular grammar, which means that you can't use regular expressions. Some regular expression engines (PHP's included, probably) have features that let you recognize some nested expressions, but that'll only take you so far. Look into traditional parsing tools, which should be able to handle your work load. This question goes into some of them.

Community
  • 1
  • 1
icktoofay
  • 126,289
  • 21
  • 250
  • 231
  • ok.. do you have any pointer as to where to find a "Parsing tool"? – Petter Thowsen Aug 03 '13 at 04:48
  • @Petter: [This question](http://stackoverflow.com/q/2093228) goes through some of them. Unfortunately, even once you've got an appropriate tool, it might not be obvious how to use it. I'm not sure that there's that much that goes over doing parsers in PHP; PHP isn't a very popular language for that sort of thing. Should you go down this path, you might need to do more research on parsing techniques. – icktoofay Aug 03 '13 at 04:53
  • alright, thanks. I think I have a solution though, I just need to read the string line by line and count the number of @something() and @end's. and wait until there is an equal amount of both. – Petter Thowsen Aug 03 '13 at 04:58