0

Greetings All,

I need to optimize a RegEx I am using to parse template tags in my CMS. A tag can be either a single tag or a matching pair. An example of some tags:

{static:input:title type="input"}

{static:image:picture}<img src="{$img.src}" width="{$img.width}" height="{$img.height"} />{/static:image:picture}

Here is the RegEx I currently have that properly selects what I need but I ran it through the RegexBuddy debugger and it takes tens of thousands of steps to do one match if the HTML page is quite large.

{static([\w:]*)?\s?(.*?)}(?!"|')(?:((?:(?!{static\1).)*?){/static\1})?

When this matches a tag, Group 1 is the parameters which is all the colon separated words. Group 2 is the parameters. And Group 3 (If it's a tag pair) is the content between each tag.

I'm also having problems when I stick these tags inside my conditional tags as well. Something like this doesn't match group 2 properly (Group 2 should be blank in both the matched tags below):

{if "{static:image:image1}"!=""}
    <a href="{static:image:image1}" rel="example_group" title="Image 1"></a></li>
{/if}

Another situation that needs to work is have the same tag being used twice in a row but the first instance being used a single tag and the second being used as a tag pair. So something like this:

{static:image:picture}
{static:image:picture}<img src="{$img.src}" width="{$img.width}" height="{$img.height"} />{/static:image:picture}

There needs to be two separate matches. The first match would have only group 1. The second match would have group 1 and group 3.

If anyone needs more information, please don't hesitate to ask. The CMS is built in PHP using the CakePHP framework.

Big kudos to anyone who can help me out :D!

Cody L.
  • 338
  • 2
  • 13
  • Use Smarty (http://smarty.net/) – Cfreak Oct 08 '10 at 03:44
  • I'm not going to use Smarty. I want this to be tightly coupled to my CMS. I do not need all of the "Features" that Smarty has. – Cody L. Oct 08 '10 at 03:48
  • use mustache http://github.com/bobthecow/mustache.php you are trying to do this by the looks of things http://github.com/bobthecow/mustache.php/blob/master/examples/dot_notation/dot_notation.mustache – dogmatic69 Oct 08 '10 at 16:27
  • This looks very promising actually! It is very simalure to what I'm doing. Just need to hook into CakePHP properly. With the the function tags, it needs to call actions inside a controller. Or maybe not....since models don't really have to have accessed through a controller... This has opened up many options :). Thanks dogmatic69! – Cody L. Oct 10 '10 at 12:35

2 Answers2

1

Your syntax is too complicated for regular expressions. You need a context-free grammar. (Read up on the Chomsky hierarchy to understand why.)

I second the recommendation to use an existing template language (such as Smarty) rather than inventing your own.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Greetings Zack, thanks for your reply. I'm sorry but I do not have the mental capacity to follow anything that has been written on that article you posted. As for my syntax, it seems pretty simple to me. I've got the RegEx working, it just needs some fine tuning. As for using another template language, I'm not going to do that. I haven't found one I like nor one that will fit nicely into how my CMS works. The one I've written works but just needs some fine tuning. – Cody L. Oct 08 '10 at 03:55
  • I'll try to summarize: you showed several examples that hinge on tags being nested inside other tags' parameters or content. It is *mathematically impossible* for regular expressions to process a syntax that involves nesting. You may be able to get it working on toy examples but you will not be able to make it handle the general case. A context-free parser is the correct tool for this job. And it will make your performance problem go away, too. http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns discusses the problem you're facing with less math. – zwol Oct 08 '10 at 04:05
  • I see what you are getting at but no, my template engine does not need to be recursive like you think. These tags are not allowed to be inside of one another. The only nesting is my conditional parser and those are working just fine. So you can't have something like: {static:input:foo param="{static:input:bar}"}. – Cody L. Oct 08 '10 at 04:12
  • That's basically what I meant by "you may be able to get it working on toy examples". Technically, context-free parsers are only *necessary* if you have *unlimited* nesting; up to any fixed level you *can* do it with regexes. However, you are making life harder for yourself by insisting on regexes. Your second and third problems (where things don't match what you want them to) are trivial to solve in a context-free grammar, but enormously difficult in REs, and like I said, your performance problem will also vanish if you switch. – zwol Oct 08 '10 at 13:22
0

I've come up with a solution that is working very nicely for now. I'm going through and grabbing all the paired tags first and then grabbing the single tags after that. I then use PHP to do the recursive aspect of tags being inside other tags content.

The suggestion that dogmatic69 came up with might be a more complete fix further down the track.

Thank you all for your suggestions and possible solutions.

Cody L.
  • 338
  • 2
  • 13