0

Ok basically what I am trying to do is create a kind of BB Code system without using regex. The code that Im using below seems like it would work perfectly although it's not. Basically the code is supposed to take a string and remove all the break tags from inside all of the [code][/code] blocks and replace that back into the entire string. Then the code is supposed to turn the [code][/code] tags into "pre" tags for the SyntaxHighlighter script I'm using.

Unfortunately the code doesn't completely work 100%. In some cases it will still leave the break tags inside the [code][/code] blocks. My code is:

<?php
$string = "Hello\n[code]\nCode One\n[/code]\n[code]\nCode Two\n[/code]\n[code]\nCode    Three\n[/code]";
$string = nl2br($string);
$openArray = array();
$closeArray = array();
$original = "";
$newString = "";

$i = 0;
if(strpos($string, "[code]") === 0) {
    array_push($openArray, 0);
}
while($i = strpos($string, "[code]", $i + 1)) {
    array_push($openArray, $i);
}
while($i = strpos($string, "[/code]", $i + 1)) {
    array_push($closeArray, $i + 7);    
}
for($j = 0; $j < count($openArray); $j++) {
    $length = $closeArray[$j] - $openArray[$j];
    $original = substr($string, $openArray[$j], $length);
    $newString = strip_tags($original);
    $string = str_replace($original, $newString, $string);
}
$string = str_replace("[code]", '<pre class="brush: plain">', $string);
$string = str_replace("[/code]", '</pre>', $string);
echo $string;
?>

All answers are greatly appreciated as I have been wondering what is wrong with this for quite some time now and Ive tried many different ways!

hakre
  • 193,403
  • 52
  • 435
  • 836
gordsmash
  • 121
  • 3
  • 10
  • 1
    @minitech any suggestion for the OP to do it the right way? – flowfree Jun 19 '12 at 01:11
  • @bsdnoobz: No. Parsing BBCode with `strpos` is always wrong. There is no way to fix it. – Ry- Jun 19 '12 at 01:13
  • 1
    You went the wrong way from regular expressions. You should be moving up, not down, in the power of your parsing toolset. You can find any number of pre-built libraries for parsing BBCode, but if you want to build your own, use a parser generator, not `strpos`. – Mark Reed Jun 19 '12 at 01:20
  • I don't think it's right to just say `strpos` is wrong. It's not wrong for string analysis and I don't see the many commenteers to actually explain well the problems you *might* run into with `strpos`. – hakre Jun 19 '12 at 01:44
  • @Mark Reed: Which parser generator do you suggest for PHP? Have you used one of the very few your own? – hakre Jun 19 '12 at 03:30
  • The problem with `strpos` is that it's the wrong tool for the job. You're hammering a bolt into a nut. It's too low-level for the task, as a result of which it's very hard to get right, as OP has discovered. I have written many parsers, but I have never tried to write one in PHP, so I have no personal recommendation. Others made some [here](http://stackoverflow.com/questions/3720362/what-is-a-good-parser-generator-for-php). – Mark Reed Jun 19 '12 at 03:50

2 Answers2

1

The major problem I see with your processing is that you store the open and the close tag pretty independent to each other. You then later on process them as if each one would belong to each other, but that's just not guaranteed because you do not validate if a closing code follows an opening code and if not two opening or closing codes after each other which should give a parse error.

You could write yourself a little helper function that, like strpos, returns you the next position of a open and closing code pair:

function codepos($string, $code, $offset) {
    $offset = 0;
    if (FALSE === $start = strpos($string, "[$code]", $offset)) {
        return FALSE;
    }
    if (FALSE === $stop = strpos($string, "[/$code]", $start) {
        throw new Exception('Close code not found.');
    }
    if ($next = strpos($string, "[$code]", $start + 1) && $next < $stop) {
        throw new Exception('Double opening detected.');
    } 
    $pos = new stdClass;
    $pos->start = $start;
    $pos->stop = $stop;
    $pos->code = $code;
    return $pos;
}

It's then easier to process this alter on, as you already know that things are in order. Instead of throwing exceptions you can just run FALSE and give notice somehow differently. And this routine does not yet check for a closing code before the first starting code.

$offset = 0;
while($pos = codepos($string, 'code', $offset))
{
    ... process each code-pair.
}
hakre
  • 193,403
  • 52
  • 435
  • 836
  • Thank you for your answer! Im gonna play with the code you posted for a few minutes. – gordsmash Jun 19 '12 at 02:12
  • I finally got some working code but I just used some regular expressions to make it work, I just didnt want to use regex in the beginning because I thought I could have a little fun with the challenge of not using it but I spent a little too much time on this than I should have but thank you very much for posting your code and helping me buddy :D – gordsmash Jun 19 '12 at 06:39
0

For learning or for an intranet tool only, not to be even considered on the www:

You need to take into consideration:
Lines may be longer than the string buffer. Know you will have a max line size unless you code around it.

Code for possible close tags before open tags and possible missing close/open tags unless you assume the input will always be correct.

Be able to handle the following cases:
State1 Looking for one or more open tags:
No open/close tags
Open tag only
Close tag first - parse fails
one or more matching open/close tags (in proper order)
one or more matching open/close tags (in proper order) ending with open tag
End of document - OK
State2 Looking for close tag:
close tag followed by one or more matching open/close tags (in proper order)
close tag followed by one or more matching open/close tags (in proper order) ending with open tag
no close tag
End of document - Parse fails