1

*note - this post is only about XSS attacks and not about SQL injections as we already use prepared statements

Hi all,

I plan to filter my output in regards to XSS attacks. So far, I have read that the "recommended" approach for websites in UTF-8 format is to use htmlspecialchars() to encode every output of user input data, e.g., for every relevant echo() or print() statement. (At least for websites that do not require handling user input data containing HTML)

As noted in How to prevent XSS with HTML/PHP? and How can I sanitize user input with PHP?

However, there are too many cases where user input data is being printed out on the site I'm working on, and it spreads over numerous files/web pages. It would be a mammoth project to specifically address every single related echo() and print() statement. Thus, I thought about iterating over the whole user input data object retrieved from the backend before printing out its fields with echo() or print(). For example, with this helper function:

// helper function
function xss_recursive_object_iterator(&$object)
{
    if ($object === null) {
        return;
    }
    if (is_object($object) || is_array($object)) {
        foreach ($object as $key => &$field) {
            if (is_string($field)) {
                $cleaned_field = htmlspecialchars($field, ENT_QUOTES, 'UTF-8');
                // maybe additional operations for output encoding (but which)
                // ...
                $field = $cleaned_field;
            } else if (is_array($field) || is_object($field)) {
                recursive_object_iterator($field);
            }
        }
        unset($field);
    }
}

...

// clean object with user input data retrieved from backend with the above function
xss_recursive_object_iterator($user_data_object);

...

// output of user input strings from the XSS filtered object
echo($user_data_object->field_string1);
echo($user_data_object->field_string2);

...

Instead of applying it on every single echo()/print() field

echo(htmlspecialchars($user_data_object->field_string1, ENT_QUOTES, 'UTF-8'));
echo(htmlspecialchars($user_data_object->field_string1, ENT_QUOTES, 'UTF-8'));
...

Question 1

What are the drawbacks of iterating over the whole object and applying the encoding operations to every field beforehand as shown above? Would this leave any xss output filtering issues open?

Question 2

Additionally for user data being printed inside tags I would use json_encode($field_string, JSON_HEX_QUOT|JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_APOS);

And for dynamic URLs with user input I would use htmlspecialchars(urlencode($field_string));

As suggested in Json: PHP to JavaScript safe or not? and Does urlencode() protect against XSS

Lastly it is to say that the website does not integrate user input into CSS.

Is there already a crucial aspect I am missing, or am I good so far at filtering XSS attacks apart from an additional allowlist in the Content Security Policy settings? Of course, I will also test it against the cheatsheet: https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html, but maybe there is something obvious.

For example, are there more operations missing that could provide me additional safety against XSS attacks in terms of output encoding, for example, strip_tags() or specific regex operations?

Question 3

What about already validating the data before saving? For example leveraging filter_input_array, will it give any additional security, or is it unnecessary as I filter the output for XSS anyway?

1 Answers1

0

If you do not intend to print html code from user input in general you should sanitize input before persisting it to database so "Question 3" is the way to go. filter_input_array by default sanitizes all input.

If you still have dangerous user input stored in your database either use a template engine which automatically will sanitize output or write your own function like

function _e($str) {
  echo htmlspecialchars($str);
}

Iterating over all object properties would produce unnecessary load because even fields that are not printed are encoded.

Code Spirit
  • 3,992
  • 4
  • 23
  • 34