0

I wrote a php script to fetch the email content.

These contents are HTML format.

I'd like to display the content, as below

<?php 
$email_content = '
    <html>
        <script>alert("XSS");</script>
        <body>
            <div>Line1</div>
            <div>Line2</div>
        </body>
    </html>
';
echo $email_content;
?>

As you can see, it will cause XSS attacks. But if I use htmlspecialchars function, it will not show the correct HTML format, how should I do in this case? Thanks.

Cynial
  • 680
  • 8
  • 23

2 Answers2

5

HTMLPurifer can do that:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

It takes dirty HTML (ie possibly containing Javascript) and removes any script.

PHP doesn't have anything native or built in that can remove Javacript like HTMLPurifier. You could use DOMDocument but this would be a lengthy task because Javascript can execute in some attributes (onerror, onclick) and is not just limited to <script></script>.

MrCode
  • 63,975
  • 10
  • 90
  • 112
  • I just tried it, not bad. But it removed unexpected content. The email content could be very complex but HTMLPurifer seems doesn't work stable on it. – Cynial Jun 20 '13 at 08:45
  • What did it remove? HMTLPurifer has lots of config options to change what it does/doesn't remove. The default config might not be exactly what you want. – MrCode Jun 20 '13 at 08:47
  • The signature line. There are no unsafe thing in it. The weird thing is some signature appear, some signature removed. It seems unstable. But maybe you're right, I need to dive into the config. – Cynial Jun 20 '13 at 08:56
  • I figured out. My email content has multiple tags, HTMLPurifer just fetch
    content from tag (Ref function: tokenizeHTML), so we need to explode it, and use purifyArray function.
    – Cynial Jun 21 '13 at 06:23
1

You should use strip_tags() function and allow only tags that you want user to add.

echo strip_tags($text, '<p><a>');

This line allows <p> and <a> tags every other tag will be removed.

htmlspecialchars() works totally different.

From manual:

The translations performed are:

 '&' (ampersand) becomes '&amp;'
 '"' (double quote) becomes '&quot;' when ENT_NOQUOTES is not set.
 "'" (single quote) becomes '&#039;' (or &apos;) only when ENT_QUOTES is set.
 '<' (less than) becomes '&lt;'
 '>' (greater than) becomes '&gt;'

There is very nice article about XSS prevention and CSRF prenvetion read it.

Robert
  • 19,800
  • 5
  • 55
  • 85
  • 2
    If I need `img` tag, but they use image XSS attacks? – Cynial Jun 20 '13 at 07:54
  • 2
    `strip_tags()` is not good enough because XSS can be present in attributes, it's not limited to ``. Also in some circumstances `strip_tags()` can be bypassed. – MrCode Jun 20 '13 at 08:01
  • @Cynial it's not **XSS** attack it's **CSRF** then. Read more about CSRF here http://stackoverflow.com/questions/1780687/preventing-csrf-in-php – Robert Jun 20 '13 at 08:01
  • 1
    @Robert that is wrong. an `` tag can also trigger XSS. Example: ``... Never output untrusted HTML. It's a battle you can hardly win. There are an endless amount of attack vectors. – samuirai Jun 20 '13 at 08:18