5

I have wrote a Quote function for my own personal forum, in a website written with PHP.

The message quoted tags looks like [quote=username]message[/quote], so I wrote that function :

$str=preg_replace('#\[quote=(.*?)\](.*?)\[/quote\]#is', '<div class="messageQuoted"><i><a href="index.php?explore=userview&userv=$1">$1</a> wrote :</i>$2</div>', $str);

This one works if the quote is one, but then a user quote a quote, this doesnt works. So I need a sort of recursive quote for apply this behaviour.

I tried to searching on SO many topics, but I don't really understand how it can works. Would be appreciated any suggestions/tips for do this kind of operation! Let me know, and thanks!

EDIT

At the end, this is my own solution :

if(preg_match_all('#\[quote=(.*?)\](.*?)#is', $str, $matches)==preg_match_all('#\[/quote\]#is', $str, $matches)) {
    array_push($format_search, '#\[quote=(.*?)\](.*?)#is');
    array_push($format_search, '#\[/quote\]#is');

    array_push($format_replace, '<div class="messageQuoted"><a class="lblackb" href="index.php?explore=userview&userv=$1">$1</a> wrote :<br />$2');
    array_push($format_replace, '</div>');
}

$str=preg_replace($format_search, $format_replace, $str);

it repleace only if the number of occurences is correct. So it should (right?) to prevent html broke or other malicious attack. What do you think?

kwichz
  • 2,363
  • 5
  • 23
  • 25
  • 1
    You can't do recursion (unlimited nesting) with a regex. – SLaks May 08 '11 at 15:23
  • 4
    @SLaks You can in PHP, and it actually works pretty neatly. It’s still usually a bad idea though. – Konrad Rudolph May 08 '11 at 15:38
  • 1
    @SLaks, no, that's not true. Besides PHP (as Konrad already mentioned), Perl's and the .NET regex implementations also support for recursive patterns. I also agree with Konrad that using these features in production code is a bad idea, but still, it _is_ possible. – Bart Kiers May 08 '11 at 16:09
  • So the Mel solution is wrong in your opinion? :O – kwichz May 08 '11 at 22:35
  • @Slaks: There you go again with your “you can’t do blah in a regex” silliness. Of course you can! For example, here is how to match parens with **unlimited nesting**: `\((?:[^()]*+|(?0))*\)`. Piece of cake. Pretty slick, really. – tchrist May 09 '11 at 00:22
  • @Bart: I have used recursive patterns in production code, when the data had recursive elements. I don’t believe that PHP or .NET allow for the kind of debugging that you can get with Perl’s `-Mre=debug` or the embedded `(?{print ⋯})` stuff. These make it a lot easier to debug these things. The clearest way to do things is to set of a `(?(DEFINE) ⋯)` block in which you load up named groups with subpatterns, then call them like regex-subs. I have an example of this style [in this answer](http://stackoverflow.com/questions/4840988/the-recognizing-power-of-modern-regexes/4843579#4843579). – tchrist May 09 '11 at 00:30
  • @kwichz thanks for your code, it help's me a lot, spent full day to create it but have fail. – jmp May 28 '12 at 20:25

3 Answers3

4

PCRE and regexes in PHP do allow for recursion http://php.net/manual/en/regexp.reference.recursive.php - You will need the (?R) syntax for that.

But it usually only matches recursively, it does not apply your replacement string recursively. Hencewhy you need to use preg_replace_callback at the very least.

It's difficult to get working, but I believe (totally untested) this might do in your case:

= preg_replace_callback('#\[quote=(.*?)\]((?:(?R)|.*?)+)\[/quote\]#is',
          'cb_bbcode_quote', $str);

Now the callback returns the wrapped content, after it has to invoke the same regex again on the $match[1] inner text, and preg_replace_callback-call itself.

mario
  • 144,265
  • 20
  • 237
  • 291
1

You can simply replace the opening quote tag with the opening div tag and same for the closing section. This only goes bad if the user messes up it's quote tag matching. Alternatively you can recurse the quote function with the inner section:

<?php
function quote($str)
{
    if( preg_match('#\[quote=.*?\](.*)\[/quote\]#i', $str) )
         return quote(preg_replace('#\[quote=.*?\](.*)\[/quote\]#i', '$1', $str);
    return preg_replace('#\[quote=.*?\](.*)\[/quote\]#', '<div blabla>$1</div>', $str);
}
?>
Mel
  • 6,077
  • 1
  • 15
  • 12
  • Uhm...your first simply solution sounds good! Whats about "this only goes bad?". The same is with regex : if user messes the bbcode, the same : regex don't work. Try to check my example on edit... – kwichz May 08 '11 at 16:05
  • @kwichz The recursive implementation only replaces correctly paired quotes. Replacing the strings without ensuring it's a pair can make your layout go haywire if you have multiple opening quotes and only one closer. Probably the best way is to count the opening and closing quote tags, return for re-edit if they don't match up and then replace opening and closing as mentioned. – Mel May 08 '11 at 17:06
  • Yeah you are right :) I try the recursive implementation! Is this a real solution for you? Because working with regex and bbcode, EVERYBODY said that this is just a workaround, not the solution! And I really don't know why :) – kwichz May 08 '11 at 22:31
  • If I copy and paste you recursive implementation I get an error on parsing with PHP?!?!? – kwichz May 08 '11 at 22:57
  • Added my own solution, because your recursive pattern doesnt work! What do you think about? Is it secure? :) – kwichz May 09 '11 at 15:28
0

Recursive syntax like this is precisely when regular expressions start being too weak. You should look into using some kind of parser instead.

Regular expressions (at least without some extensions), can only accept regular languages. In order to have a recursive syntax, you need a context-free language. These require more sophisticated parsers.

hammar
  • 138,522
  • 17
  • 304
  • 385
  • 2
    Most regex implementations of many of today's popular languages support back-references in their regex-patterns, making them accept more than regular languages. – Bart Kiers May 08 '11 at 16:12
  • 1
    Hammar: As soon as we had `(.*)\1`, we blew your formal language theory out of the water. That was when, like 1970? Even POSIX regexes, old and rickety, **require** backref support. Modern regex go far beyond that, including in PHP. Modern regexes are indeed fully equivalent to a recursive descent parser. – tchrist May 09 '11 at 00:24
  • @tchrist: Well, then they aren't really regular expressions anymore, are they? They are more like parser specifications in a similar syntax. I'll leave my answer up for completeness, although I realize the OP might be more interested in a pragmatic solution than formal language theory :) – hammar May 09 '11 at 00:48