How can I prevent XSS with HTML/PHP?

Question

How do I prevent XSS (cross-site scripting) using just HTML and PHP?

I've seen numerous other posts on this topic, but I have not found an article that clear and concisely states how to actually prevent XSS.

Just a note that this won't solve the case where you might want to use user input as an HTML attribute. For example, the source URL of an image. Not a common case, but an easy one to forget. — Michael Mior, May 16 '11 at 17:12
@MichaelMior here is a solution to prevent XSS in `href` or `src` HTML attribute: https://stackoverflow.com/questions/19047119/href-security-prevent-xss-attack/19047533#19047533 — baptx, Jul 20 '19 at 11:30
There's a nice article [here](https://medium.com/@mpreziuso/injection-vulnerabilities-cross-site-scripting-xss-7fd9dc28cc47) that explains XSS and how to prevent it in different languages (incl. PHP). — XCore, Apr 14 '20 at 20:12

score 346 · Answer 1 · edited Aug 03 '23 at 13:30

346

Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser in HTML context.

The correct way to use this function is something like this:

echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');

Google Code University also has these very educational videos on Web Security:

edited Aug 03 '23 at 13:30

Your Common Sense

156,878
40
214
345

answered Jan 03 '10 at 20:17

Alix Axel

151,645
95
393
500

11

@TimTim: For most cases, yeah. However, when you need to allow HTML input things get a little trickier and if this is the case I recommend you use something like http://htmlpurifier.org/ – Alix Axel Jan 03 '10 at 20:23
@Alix Axel, so is your answer to use htmlspecialchars or to use http://htmlpurifier.org/? – TimTim Jan 03 '10 at 20:39
4

If you need to accept HTML input use HTML Purifier, if not use `htmlspecialchars()`. – Alix Axel Jan 03 '10 at 20:41
9

htmlspecialchars or htmlentities ? Check here http://stackoverflow.com/questions/46483/htmlentities-vs-htmlspecialchars – kiranvj Nov 16 '12 at 06:19
4

Most of time it is correct,but it is not as simple as that. You should consider put untrusted string into HTML,Js,Css,and consider put untrusted HTML into HTML. Look at this : https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet – bronze man May 29 '14 at 17:43
Setting the encoding (third parameter) in `htmlentities` / `htmlspecialchars` should not be needed anymore since PHP 5.6: "Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier, or if your default_charset configuration option may be set incorrectly for the given input." https://www.php.net/manual/en/function.htmlspecialchars.php – baptx Jul 20 '19 at 09:25
How to handle this problem in windows-1250 encoding? – sajushko Jul 02 '21 at 09:58

score 19 · Answer 2 · edited Aug 03 '23 at 14:31

19

One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.

The form that XSS attacks usually have is to insert a link to some off-site JavaScript code that contains malicious intent for the user. Read more about it here.

You'll also want to test your site. It looks like Easy XSS is now the way to go.

edited Aug 03 '23 at 14:31

Peter Mortensen

30,738
21
105
131

answered Jan 03 '10 at 20:12

James Kolpack

9,331
2
44
59

1

What do I need to make sure I sanitize the input exactly from. Is there one particular character/string that I have to watch out for? – TimTim Jan 03 '10 at 20:14
34

@TimTim - no. **All user input** should **always** be considered as inherently hostile. – zombat Jan 03 '10 at 20:28
Besides, internal data (employees, sysadmin, etc.) could be unsafe. You should identify and monitor (with log date and user) data displayed with interpretation. – Samuel Dauzon Oct 04 '18 at 08:40

score 19 · Answer 3 · edited Aug 03 '23 at 14:22

I am crossposting this as a consolidated reference from the SO Documentation beta which is going offline.

Problem

Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.

For example, if a third-party site contains a JavaScript file:

// http://example.com/runme.js
document.write("I'm running");

And a PHP application directly outputs a string passed into it:

<?php
echo '<div>' . $_GET['input'] . '</div>';

If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:

<div><script src="http://example.com/runme.js"></script></div>

The third-party JavaScript code will run and the user will see "I'm running" on the web page.

Solution

As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.

Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.

PHP provides a few ways to escape output depending on the context.

Filter Functions

PHP's Filter Functions allow the input data to the PHP script to be sanitized or validated in many ways. They are useful when saving or outputting client input.

HTML Encoding

htmlspecialchars() will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:

<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';

Would output:

<div>&lt;script src=&quot;http://example.com/runme.js&quot;&gt;&lt;/script&gt;</div>

Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:

<script src="http://example.com/runme.js"></script>

URL Encoding

When outputting a dynamically generated URL, PHP provides the urlencode() function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:

<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo '<a href="http://example.com/page?input="' . $input . '">Link</a>';

Any malicious input will be converted to an encoded URL parameter.

Using specialised external libraries or OWASP AntiSamy lists

Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and unauthorized (blacklist).

You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (eBay API, TinyMCE, etc...). And it is open source.

There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use. For example, you have HTML Purifier.

Is it correctly understood that the xss attack will only be executed if you output the data from a form to the page immediately without any validation or sanitation? If you instead run e.g. $input = htmlspecialchars($_POST['input']) in your php code you can safely output $input ? — Flemming Lemche, Jan 01 '23 at 22:00

score 13 · Answer 4 · edited Aug 17 '22 at 08:42

Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.) I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.

/* Prevent XSS input */
$_GET   = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST  = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;

The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.

If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:

//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
   return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
   echo xssafe($data);
}

Interesting solution, overwriting all params this way. Thanks for sharing! — Nicky Kouffeld, Dec 29 '22 at 23:28

score 12 · Answer 5 · answered Jul 30 '15 at 02:10

12

In order of preference:

If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.

Also, make sure you escape on output, not on input.

answered Jul 30 '15 at 02:10

Scott Arciszewski

33,610
16
89
206

is this comment still valid **escape on output, not on input**? don't you think inputs can be malicious considering multiple technologies that processes them across the application stack? – Lakshminarayanan Guptha Apr 14 '21 at 15:51
Yes it's still valid. You should store it as-is and then escape when displaying it. If you need to update your output escaping code to mitigate a vulnerability, it's better to have the unaltered, unmangled input stored for updating your unit tests. – Scott Arciszewski Apr 19 '21 at 03:17

score 3 · Answer 6 · answered Oct 02 '15 at 04:02

<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&amp;','&lt;','&gt;'), array('&amp;amp;','&amp;lt;','&amp;gt;'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');

// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);

// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);

// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);

// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);

do
{
    // Remove really unwanted tags
    $old_data = $data;
    $data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);

// we are done...
return $data;
}

You shouldn't use `preg_replace` as it uses `eval` on your input. https://www.owasp.org/index.php/PHP_Security_Cheat_Sheet#Code_Injection — CrabLab, Mar 11 '17 at 17:19

score 3 · Answer 7 · edited Aug 03 '23 at 14:30

You are also able to set some XSS-related HTTP response headers via header(...):

X-XSS-Protection "1; mode=block"

To be sure, the browser XSS protection mode is enabled.

Content-Security-Policy "default-src 'self'; ..."

to enable browser-side content security. See this one for Content Security Policy (CSP) details:

Content Security Policy Reference

Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.

For a general bunch of useful HTTP response headers concerning the security of you web application, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers

The last link is broken (404). – Peter Mortensen Aug 03 '23 at 14:30 — Peter Mortensen, Aug 03 '23 at 14:30

score -1 · Answer 8 · edited Aug 03 '23 at 13:28

-1

The best way to protect your HTML output is to use the htmlentities() function.

Example:

htmlentities($target, ENT_QUOTES, 'UTF-8');

edited Aug 03 '23 at 13:28

Your Common Sense

156,878
40
214
345

answered Nov 17 '19 at 16:22

Marco Concas

1,665
20
25

score -2 · Answer 9 · edited Aug 03 '23 at 14:25

Use htmlspecialchars() in PHP. In HTML, try to avoid using:

element.innerHTML = “…”; element.outerHTML = “…”; document.write(…); document.writeln(…);

where var is controlled by the user.

Also obviously try avoiding eval(var). If you have to use any of them then, try JavaScript escaping them, HTML escape them and you might have to do some more, but for the basics this should be enough.