0

I have a function that strips out un-needed whitespaces from the output of my php page prior to saving the page to an HTML file for caching purposes.

However in some sections of my page I have source code in pre tags and these whitespaces effect how the code is displayed. My skill with regular expressions is horrible so I am basically look for a solution to stop this function from messing with code inside:

 <pre></pre>

This is the php function

function sanitize_output($buffer)
   {
      $search = array(
         '/\>[^\S]+/s', //strip whitespaces after tags, except space
         '/[^\S ]+\</s', //strip whitespaces before tags, except space
         '/(\s)+/s',  // shorten multiple whitespace sequences
           );
      $replace = array(
         '>',
         '<',
         '\\1',
         );
    $buffer = preg_replace($search, $replace, $buffer);
      return $buffer;
   }

Thanks for your help.

Heres what i found to be working :

Solution:

function stripBufferSkipPreTags($buffer){
$poz_current = 0;
$poz_end = strlen($buffer)-1;
$result = "";

while ($poz_current < $poz_end){
    $t_poz_start = stripos($buffer, "<pre", $poz_current);
    if ($t_poz_start === false){
        $buffer_part_2strip = substr($buffer, $poz_current);
        $temp = stripBuffer($buffer_part_2strip);
        $result .= $temp;
        $poz_current = $poz_end;
    }
    else{
        $buffer_part_2strip = substr($buffer, $poz_current, $t_poz_start-$poz_current);
        $temp = stripBuffer($buffer_part_2strip);
        $result .= $temp;
        $t_poz_end = stripos($buffer, "</pre>", $t_poz_start);
        $temp = substr($buffer, $t_poz_start, $t_poz_end-$t_poz_start);
        $result .= $temp;
        $poz_current = $t_poz_end;
    }
}
return $result;

}

function stripBuffer($buffer){
// change new lines and tabs to single spaces
$buffer = str_replace(array("\r\n", "\r", "\n", "\t"), ' ', $buffer);
// multispaces to single...
$buffer = preg_replace(" {2,}", ' ',$buffer);
// remove single spaces between tags
$buffer = str_replace("> <", "><", $buffer);
// remove single spaces around &nbsp;
$buffer = str_replace(" &nbsp;", "&nbsp;", $buffer);
$buffer = str_replace("&nbsp; ", "&nbsp;", $buffer);
return $buffer;

}

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
jason
  • 1,132
  • 14
  • 32
  • 4
    Are you compressing for disk-space? If so, have you considered using gz compression? (http://php.net/gz_deflate) – Adam Wagner Mar 29 '11 at 06:23
  • The code in the question is from: http://ru.php.net/manual/en/function.ob-start.php#71953 – Kobi Mar 29 '11 at 06:52
  • @Adam - You are correct. That should be an answer, not a comment. See also: http://stackoverflow.com/questions/3095424/minify-html-php – Kobi Mar 29 '11 at 07:00
  • Maybe the real question is "why is your php producing un-needed whitespaces?". Rather than stripping them out, you may want to investigate why you have, and then modify your code so that the un-needed whitespaces don't come out. In this way, you may (1) preserve whitespaces where you want them (in `
    ` tags), and (2) avoid regex for parsing HTML, which is, in general, not a good idea (see Andrea Spadaccini's answer below).
    – MarcoS Mar 29 '11 at 07:03
  • 1
    just don't do this. if you want to save a few bytes, use an html compressor, don't try to roll your own using some hack-job regexes; you'll create more problems than you'll solve. – mpen Mar 29 '11 at 07:08
  • I solved the issue with a function i found online, I understand where you are all coming from, regular expression or compressing html doesn't seem to be worth it... but the whole page is only compressed one time after it has been updated and then it always grabs the html cache'd file. Updating is very rare, it doesn't save a few bytes it saves a few kb (5-10kb)... and its worth it in the long run. – jason Mar 29 '11 at 07:18
  • 1
    Note that any element can be declared with pre-formatting by adding a `whitespace:pre` CSS declaration. `` is usally another element one that's preformatted. The whole idea of HTML minifaction is just pointless if you are not in a super high traffic scenario. If you want to save bandwidth, send the content gzipped. – Gordon Mar 29 '11 at 07:27
  • @jason - can you post the solution you have found and mark your question as resolved ? – Stephan Mar 29 '11 at 11:56

2 Answers2

0

Regular expressions are known to be evil (see this and this) when it comes to parsing HTML.

That said, try to do what you need in another way, like using a DOM parser and customizing its HTML output functions.

Community
  • 1
  • 1
Andrea Spadaccini
  • 12,378
  • 5
  • 40
  • 54
0

If you are compressing for disk-space, you should consider using gz compression. (php.net/gz_deflate)

Adam Wagner
  • 15,469
  • 7
  • 52
  • 66