0

I'm creating a "HTML editor" for a webpage of mine. At the moment, I only want the editor to allow entry of HTML and CSS elements and not Javascript (or Jquery for that matter).

I'm trying to find a way that disables the use of <script> or <script type="text/javascript"> </script> using PHP. However the current way outputs a messy result!

        $content_in_before = str_replace('<script','',$content_in_before);
        $content_in_before = str_replace('script>','',$content_in_before);

It's also not very well coded!

Is there a more bulletproof way of coding this, stopping all type of Javascript from being entered into this form? (While still allowing CSS and HTML)?

Thanks in advance!

zuc0001
  • 924
  • 4
  • 12
  • 27
  • [`strip_tags()`](http://php.net/strip_tags) perhaps? – bishop Dec 20 '14 at 03:22
  • [Regular Expressions](http://php.net/manual/en/reference.pcre.pattern.syntax.php) – Mike Loffland Dec 20 '14 at 03:25
  • 2
    This will not solve your problem, you can still add javascript via event attributes like `onLoad`, `onClick`, etc. If you really want to make sure no scripts get uploaded, use another language like markdown or use a proven library. – jeroen Dec 20 '14 at 03:27
  • The use of attributes such as onLoad and onClick do not worry me, as there is a minimal amount of things you can do with them. – zuc0001 Dec 20 '14 at 03:30
  • Anything you can do in a script tag, you can do in an `onLoad` attribute... – jeroen Dec 20 '14 at 03:32
  • Are you able to create functions through such an attribute? – zuc0001 Dec 20 '14 at 03:32
  • Sure, but what is your concern with functions? They're just containers for the actual code that gets executed. – jeroen Dec 20 '14 at 03:34
  • I'm not particularly concerned about the use of functions, however the webpage contains AJAX for grabbing data, which I'm afraid by using the HTML editor will allow the user to execute code from within these files. If I shouldn't be concerned, then I'd allow javascript within this editor. – zuc0001 Dec 20 '14 at 03:36
  • no point only trying to remove the easy scripts to find and ignoring more malicious approaches – charlietfl Dec 20 '14 at 03:38
  • @zuc0001 Yes you are. The JavaScript contained within the attributes is automatically executed upon the action. For example, `onLoad` will execute the enclosed JS as soon as it's loaded. – Brandon Anzaldi Dec 20 '14 at 03:39
  • @zuc0001 As was suggested previously, Markdown would be an exceptional option, but if you want users to be able to use CSS, you'll have to find a solution for that. [PHP Markdown Library](https://michelf.ca/projects/php-markdown/) – Brandon Anzaldi Dec 20 '14 at 03:42

4 Answers4

2

I'd recommend using a sanitization library, like HTML Purifier, since just stripping <script> tags isn't enough to prevent XSS attacks, since JS can be automatically executed using attributes like onLoad, onMouseOver, onUnload, etc.

To remove tags, and allow some, you can use PHP's strip_tags() function, but it doesn't strip the attributes, hence my recommendation for a HTML sanitization library. If you're able to run it, perhaps one of the best choices is Google's Caja library, albeit it doesn't work in shared hosting environments since it's written in Java, but it can be hosted on Google's AppEngine.

Also, simple regex solutions aren't always reliable, since even malformed tags can still be parsed. For example, <script > wouldn't be caught by simple regex detection of normal script tags unless it's looking for spaces after the tag name. It's possible to check for this, but using an established library would save you time, and would give you the added bonus of a battle-tested library.

Example: Script Tags with Spaces producing an alert

Brandon Anzaldi
  • 6,884
  • 3
  • 36
  • 55
  • 1
    Thank you very much! I decided to implement HTML Purifier onto the page and it works like a charm! – zuc0001 Dec 20 '14 at 04:03
  • 1
    This is the right way to do it. Even though one could write a regex to replace the attributes, but that would require too much work, and regex isn't the right tool. – Ismael Miguel Dec 20 '14 at 16:03
1

You could you a regexplike this

echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $var);

source: https://stackoverflow.com/a/1886842/2046700

or as stated us a libary to do this for you such as: http://htmlpurifier.org/

another possible example:

<?php    
   $javascript = '/<script[^>]*?javascript{1}[^>]*?>.*?<\/script>/si'; 
   $noscript = '';    
   $document = file_get_contents('test.html'); 
   echo preg_replace($javascript, $noscript, $document);  
?>
Community
  • 1
  • 1
clonerworks
  • 641
  • 1
  • 6
  • 16
1

Whitelist tags you permit, and attributes you permit, then remove everything else. You can use DOMDocument for this.

I wrote this piece of code once but never had anyone else review it

function legal_html($str, $tags='<a><b><br><i><span><table><tbody><tr><td><thead><th><img>', $attribArray=false) {
    if ($attribArray===false) {
        $attribs = array('id','class','src','href','alt');
    } else {
        $attribs = $attribArray;
    }
    $stripped = strip_tags($str,$tags);
    $dom = new DOMDocument();
    @$dom->loadHTML('<div>'.$stripped.'</div>');
    foreach ($dom->getElementsByTagName('*') as $node) {
        for ($i = $node->attributes->length -1; $i >= 0; $i--) {
            $attrib = $node->attributes->item($i);
            if (!in_array($attrib->name,$attribs)) $node->removeAttributeNode($attrib);
        }
    }
    $stripped = $dom->saveHTML();
    $start = strpos($stripped,'<div>')+5;
    $end = strrpos($stripped,'</div>');
    $stripped = trim(substr($stripped,$start,$end-$start));
    return $stripped;
}
Paul S.
  • 64,864
  • 9
  • 122
  • 138
-3

You can use something likes this-

$content=$_POST['textbox'];

if(strpos($content,'<script>')!==false){
//show error;
}
else{
//proceed with work;
}
LalaByte
  • 129
  • 9
  • Although this is a good idea, I'd prefer the other text to still show; and somehow have the – zuc0001 Dec 20 '14 at 03:31