2

What I'm trying to do is to replace "functions" within a user-inserted text (say; a blog post) with certain blocks of html, but use option/value pairs within the "function". Clear? No?! Thought so :) Here's an example:

Some text, can be long, may be short, a nice story, or just a comment.
{{function option1="value1" option2="value2"}}
And some more text!
{{function2 option1="value1" option2="value2"}}

In the text, I want to replace and parse the {{function ...}} part. A more concrete example could be:

{{youtube videokey="_VIDEOKEY_"}}

which should be replaced by the youtube embed code:

<iframe width="420" height="315" src="http://www.youtube.com/embed/_VIDEOKEY_" frameborder="0" allowfullscreen></iframe>

For this I want to use the preg_replace_callback() function, so I can have some room to do some calculations on the data/options passed.


The problem: I can get and replace the substring formatted like this ({{ ... }}), and even match the option/value pair, the problem is that I cannot get every single o/v pair in the matches array, only the last one.

I have tried a lot of expressions, one of which I think is closest is:

\{\{\w+([[:space:]]+(([0-9a-zA-Z]+)=\"([0-9a-zA-Z]+)\"))+\}\}

As you can see I try to match:

  1. A string within {{ and }}
  2. In which the first part is a word
  3. Followed by one or more option/value pairs:
    • one or more spaces
    • one or more letters or digits (the option name)
    • the = sign
    • one or more letters or digits, enclosed by " (the option value)

In example the text above will match (using preg_match_all):

array(5) (
    0 => array(2) (
        0 => string(46) "{{function option1="value1" option2="value2"}}"
        1 => string(47) "{{function2 option1="value1" option2="value2"}}"
    )
    1 => array(2) (
        0 => string(17) " option2="value2""
        1 => string(17) " option2="value2""
    )
    2 => array(2) (
        0 => string(16) "option2="value2""
        1 => string(16) "option2="value2""
    )
    3 => array(2) (
        0 => string(7) "option2"
        1 => string(7) "option2"
    )
    4 => array(2) (
        0 => string(6) "value2"
        1 => string(6) "value2"
    )
)

And when using preg_replace_callback with this regular expression of course I receive the same set of matches (in a one-dimensional array that is).


I have this solution, but don't like it (because it involves a regular expression on a regular expression match, while I think it should be possible to do it in one expression):

$input = ... // see text above
$output = preg_replace_callback('@\{\{\w+([[:space:]]+(([0-9a-zA-Z]+)=\"([0-9a-zA-Z]+)\"))+\}\}@', 'my_replace_function', $input);

function my_replace_function($match) {
    preg_match_all('@([0-9a-zA-Z]+)=\"([0-9a-zA-Z]+)\"@', $match[0], $matches);
    // do something with the $matches
}

Is it even possible to deliver to my callback function an array with ALL option/value pairs, not only the last match, and use that data to parse the string? If so, could you please point me in the right direction?

Basically the question is: can I separate repitious subpatterns in the matches?


---Edit--- The solution proposed above (capturing the whole 'function'-block, then match the option-value pairs within the matched string) is in fact the solution to this puzzle. For more detail please see the answer of @m.buettner below (the accepted one).

giorgio
  • 10,111
  • 2
  • 28
  • 41

1 Answers1

2

You can't. Sorry, but it's that simple. Most regex engines do not support capturing multiple values with a single capturing group. Which is equivalent to say, most regex engine support only a finite and fixed number of captures. .NET is the big exception here. But you are using PCRE - and PCRE will always return the last capture for each group (see here, official citation pending; but google for "PCRE repeated capturing group", all sources agree). And the number if groups is fixed by the number of parentheses in your pattern. Sometimes there are possible work arounds, where you transform your repeated captures into repeated matches, but I think that is not applicable either in your case.

So your solution is really the right way to go about it. You match the whole {{...}} block, and then parse out the key-value pairs within the callback separately.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130