7

I use php tidy to process html input in my database,

$fragment = tidy_repair_string($dom->saveHTML(), array('output-xhtml'=>1,'show-body-only'=>1));

I have this php_tidy turned on in my server but my live server doesn't support tidy,

Fatal error: Call to undefined function tidy_repair_string() in /customers/0/5/a/mysite.com/httpd.www/models/functions.php on line 587

Any alternative can I have to fix this problem?

PatomaS
  • 1,603
  • 18
  • 25
Run
  • 54,938
  • 169
  • 450
  • 748
  • Maybe only the OO way works: `$tidy = new tidy(); $clean = $tidy->repairString($dom->saveHTML(), ...);` – Rudie Aug 01 '11 at 08:57
  • nope...but I found another solution for this already which is using regex... thanks! – Run Aug 02 '11 at 01:40

5 Answers5

8

I found htmLawed to be very fast. I found it when looking for an alternative to HTMLPurifier, which was very slow.

Wadih M.
  • 12,810
  • 7
  • 47
  • 57
Justin
  • 2,914
  • 5
  • 41
  • 66
  • htmLawed is the winner for me. I am using it in a Moodle site, because get that same "undefined function tidy_repair_string()" error. – Mike Finch Mar 15 '16 at 20:34
6

Or simply pass through the DOMDocument object:

$dirty = "<xml>some content</xml>"
$x = new DOMDocument;
$x->loadHTML($dirty);
$clean = $x->saveXML();
Flavien Volken
  • 19,196
  • 12
  • 100
  • 133
  • 2
    Thank you, saved me. FYI `libxml_use_internal_errors(true);` will suppress php warnings generated by bad HTML. – Krista K Feb 22 '13 at 03:31
  • @visualex did not try but normally it depends on the "encoding" attribute you put on the root of the xml file. for instance: you can also provide the encoding programmatically. more to read [here](http://stackoverflow.com/questions/3575109/php-using-domdocument-whenever-i-try-to-write-utf-8-it-writes-the-hexadecimal-n) – Flavien Volken Nov 12 '14 at 06:53
4

HTML Purifier can rewrite HTML to be standards-compliant like HTML Tidy. If you need to filter that input for XSS prevention, etc., it will do that as well.

It's all PHP, so you should be able to use it on any server.

Chris Hepner
  • 1,552
  • 9
  • 16
4

If you are on a RedHat / CentOS / Fedora linux box and have root access to your server you can run...

yum install php-tidy

as root. Then restart apache and that should get you going.

There may be errors about missing dependencies that need to be added but usually the above command will be all you need.

Other distributions will have slightly different commands but something similar should be available.

On windows you need to install it manually. Instructions can be found here... http://devzone.zend.com/article/761#Heading3

Night Owl
  • 4,198
  • 4
  • 28
  • 37
0

PHP SuperTidy?

I got fed up with how poorly PHP Tidy works, so I began writing this one. It should Tidy up any javascript as well. It is not fully tested, so you might find some contingencies that need to be accounted for. It would be neat to see some of these other talented developers out there expound on this. P.S. I know this is an old thread, but I wanted to share this somewhere...

It's a start. Enjoy.

SuperTidy Implementation

$Tidy = new SuperTidy($html);
$Tidy->SetIndentSize(4);
$Tidy->SetOffset(0);
echo $Tidy->BeautifiedHTML();

SuperTidy Class:

<?php
    class SuperTidy
    {
        /*
            Name: PHP SuperTidy
            Author: Paul Ishak
            Copyright: 2020
        */
        private $usedJSNames = [];
        private $indentSize = 4;
        private $sourceHtml = "";
        private $offset = -4;
        public function SetIndentSize($size)
        {
            $this->indentSize = $size;
        }
        public function __construct($html)
        {
            $this->sourceHtml = $html;
        }
        public function OriginalSource()
        {
            return $this->sourceHtml;
        }
        public function UpdateSource($html)
        {
            $this->sourceHtml = $html;          
        }
        public function SetOffset($offset)
        {
            $this->offset = $offset;
        }
        function BeautifiedHTML()
        {
            $this->usedJSNames = [];
            $buffer = $this->sourceHtml;
            $spacesPerIndent = $this->indentSize;
            $JSPlaceHolders = [];
            $out = str_replace("\r","\n",$buffer);
            $out = str_replace("\n\n","\n",$out);
            $out = str_replace("<script", "\n<script",$out);
            $out = str_replace("</script>", "\n</script>\n",$out);
            $lines = explode("\n",$out);
            $javascript = "";
            $outLines = [];
            for($i = 0; $i < count($lines); $i++)
            {
                $line = $lines[$i];
                $line = trim($line);
                if($line == "</script>") continue;
                if(strlen($line) >= strlen("<script"))
                {
                    if(strtolower(substr($line,0,7)) == "<script")
                    {
                        if(strpos(strtolower($line),"</script>"))
                        {
                            $outLines[] = $line;
                        }
                        else
                        {
                            $counter = $i + 1;
                            $jsLine = $lines[$counter];
                            $javascript = "";
                            $lineCount = 0;
                            while(strtolower(trim($jsLine)) !== "</script>")
                            {
                                $lineCount++;
                                $javascript.=$jsLine."\n";
                                $counter++;
                                if($counter > count($lines) - 1) break;
                                $jsLine = $lines[$counter];
                            }
                            $i+=$lineCount;
                            if(trim($javascript) == "")
                            {
                                $i++;
                                $line2 = $lines[$i];
                                $thisLine = $line.$line2;
                                if(strpos($thisLine,"src="))
                                {
                                    $outLines[] = $thisLine;
                                }
                                else
                                {
                                    $chars = str_split($thisLine);
                                    
                                    $stO = strpos(strtolower($thisLine),"<script");
                                    $enO = strpos(strtolower($thisLine),">",$stO)+1;
                                    $tagO = substr($thisLine,$stO,$enO);
                                    
                                    $stC = strpos(strtolower($thisLine),"</script");
                                    $enC = strpos(strtolower($thisLine),">",$stC)+1;
                                    $tagC = substr($thisLine,$stC,$enC);
                                    $javascript = substr($thisLine,$enO,$stC - $enO);
                                    $outLines[] = "<script type='application/javascript'>".$javascript."</script>";             
                                }
                            }
                            else
                            {
                                $unique = $this->GetUniqueJSPlaceHolder($out);
                                $JSPlaceHolders[$unique] = ['javascript'=>$javascript];
                                $outLines[] = "<$unique type='application/javascript'></$unique>";
                            }
                        }
                    }
                    else
                    {
                        $outLines[] = $line;
                    }
                }
                else
                {
                    $outLines[] = $line;
                }
            }
            $modHTML = "";
            foreach($outLines as $line)
            {
                $modHTML .= $line."\n";
            }
            $modHTML = str_replace("\n","",$modHTML);
            $modHTML = str_replace(">",">\n",$modHTML);
            $modHTML = str_replace("<","\n<",$modHTML);
            $modHTML = str_replace("\n\n","\n",$modHTML);
            $lines = explode("\n",$modHTML);
            $outLines = [];
            $indentLevel = -$spacesPerIndent + $this->offset;
            $openTags = [];
            foreach($lines as $line)
            {
                $line = trim($line);
                if($line !== "") $outLines[] = $line;
            }
            $modHTML = "";
            for($j = 0; $j < count($outLines); $j++)
            {
                $line = $outLines[$j];
                $isCloseTag = false;
                $firstChar = substr($line,0,1);
                $isMetaTag = substr(strtolower($line),1, 4) == "meta" ? true: false;
                $isDocType = substr(strtolower($line),2, 7) == "doctype" ? true: false;
                $isSelfClosing = substr($line, strlen($line)-2,1) == "/" ? true : false;
                $beginComment = substr($line, 0,4) == "<!--" ? true : false;
                $applyIndent = ($firstChar == "<") ? true : false;
                $applyIndent = $isMetaTag     ? false : $applyIndent;
                $applyIndent = $isDocType     ? false : $applyIndent;
                $applyIndent = $isSelfClosing ? false : $applyIndent;
                $applyIndent = $beginComment ? false : $applyIndent;
                $contentIndent = $applyIndent ? false : true;
                $tag = "";
                if($applyIndent) 
                { //This is a tag only
                    $tagInner = substr($line,1,-1);
                    $tag = "";
                    for($i = 0; $i < strlen($tagInner); $i++)
                    {
                        $char = substr($tagInner,$i,1);
                        if($char == " ") break;
                        if($char == ">") break; 
                        $tag .=$char;
                    }
                    $isCloseTag = substr($tag,0,1) == "/" ? true: false;
                    
                    if($isCloseTag)
                    {
                        $indentLevel -= $spacesPerIndent;   
                    }
                    else
                    {
                        $indentLevel += $spacesPerIndent;
                        $findTag = "</$tag>";
                        $line2 = $outLines[$j+1];
                        if(strtolower($line2) == strtolower($findTag))
                        {
                            $line = $line.$line2;
                            $j+=1;
                            $indentLevel -= $spacesPerIndent;
                            $isCloseTag = true;
                        }
                    }
                }
                $spaces = $indentLevel;
                $spaces += $contentIndent ? $spacesPerIndent : 0;
                $spaces += $isCloseTag    ? $spacesPerIndent : 0;
                $prependSpace = str_repeat(" ", $spaces);
                $line = $prependSpace.$line;
                if($tag !== "")
                {
                    $keys = array_keys($JSPlaceHolders);
                    if(in_array($tag,$keys))
                    {
                        $JSPlaceHolders[$tag]['indent'] = $indentLevel;
                    }
                }           
                $modHTML .= $line."\n";
            }
            $keys = array_keys($JSPlaceHolders);
            foreach($keys as $key)
            {
                $javascript = $JSPlaceHolders[$key]['javascript'];
                $indentOffset = $JSPlaceHolders[$key]['indent']+1;
                $javascript = $this->JSTidy($javascript, $indentOffset + ($spacesPerIndent*2), $spacesPerIndent);
                $otStart = strpos($modHTML,"<$key");
                $otEnd   = strpos($modHTML,">", $otStart)+1;
                $ot = substr($modHTML,$otStart, ($otEnd - $otStart));
                $otOut = str_replace($key, "script",$ot);
                $ctStart = strpos($modHTML,"</$key", $otEnd);
                $ctEnd = strpos($modHTML,">", $ctStart)+1;
                $ct = substr($modHTML,$ctStart, ($ctEnd - $ctStart));
                $ctOut = str_repeat(" ",$indentOffset+$spacesPerIndent-1).str_replace($key, "script",$ct);
                $otOut .= "\n".$javascript."\n";
                $modHTML = str_replace($ot,$otOut,$modHTML);
                $modHTML = str_replace($ct,$ctOut,$modHTML);
            }
            return $modHTML;
        }
        function JSTidy($javascript, $indentOffset, $spacesPerIndent)
        {
            $javascript = str_replace("{", "\n{",$javascript);
            $javascript = str_replace("}", "\n}",$javascript);
            $minJs = preg_replace(array("/\s+\n/", "/\n\s+/", "/ +/"), array("\n", "\n ", " "), $javascript);
            $jsLines = explode("\n",$minJs);
            $jsOut = "";
            $indent = $indentOffset;
            $count = count($jsLines);
            for($j = 0; $j < $count;$j++)
            {
                $line = trim($jsLines[$j]);
                if($line == "") continue;
                $c = substr($line,0,1);
                if($c == "}") $indent = $indent - $spacesPerIndent;
                $i = 0;
                $outLine = "";
                while(++$i < $indent)
                {
                    $outLine .=" ";
                }
                $outLine .=$line;
                $jsOut .=$outLine;
                if($j < $count - 2)
                {
                    $jsOut .="\n";
                }
                if($c == "{") $indent = $indent + $spacesPerIndent;             
            }
            return $jsOut;
        }
        function GetUniqueJSPlaceHolder($targetHTML)
        {
            $this->usedJSNames;
            $str = rand(); 
            $unique = "JS".strtoupper(hash("sha256", $str));
            while(strpos($targetHTML,$unique) || in_array($unique, $this->usedJSNames))
            {
                $str = rand(); 
                $unique = "JS".strtoupper(hash("sha256", $str));
            }
            $this->usedJSNames[] = $unique;
            return $unique;
        }
    }
?>
Paul Ishak
  • 1,093
  • 14
  • 19
  • 1
    Am tired of these solutions not working for me as well. If you share a git repository for this, I can make a pull request to contribute modifications I made to this, like ignoring indents for tags and more. I like your custom approach. – Martin Apr 12 '22 at 21:07
  • Done: https://github.com/PaulIshak/supertidy – Paul Ishak Apr 20 '22 at 16:16