1

I seem to have trouble understanding when to use htmlspecialchars().

Let's say I do the following when I am inserting data:

$_POST = filter_input_array(INPUT_POST, [
    'name' => FILTER_SANITIZE_STRING,
    'homepage' => FILTER_DEFAULT // do nothing
]);

$course = new Course();
$course->name = trim($_POST['name']);
$course->homepage = $_POST['homepage']; // may contain unsafe HTML

$courseDAO = DAOFactory::getCourseDAO();
$courseDAO->addCourse($course);  // simple insert statement

When I ouput, I do the following:

$courseDAO = DAOFactory::getCourseDAO();
$course = $courseDAO->getCourseById($_GET['id']);
?>

<?php ob_start() ?>

<h1><?= $course->name ?></h1>
<div class="homepage"><?= $course->homepage ?></div>

<?php $content = ob_get_clean() ?>

<?php include 'layout.php' ?>

I would like that $course->homepage be treated and rendered as HTML by the browser.

I've been reading answers on this question. Should I be using htmlspecialchars() anywhere here?

Community
  • 1
  • 1
Mikey
  • 6,728
  • 4
  • 22
  • 45
  • 1
    I personally run everything through `trim( htmlspecialchars( strip_tags( $input ) ) )` every time I get something from the client. I *never* trust client data. So in your case, I would be doing that to your `$_POST[ 'name' ]` and `$_POST[ 'homepage' ]`. I want to be sure that whoever is handling data is handling safe data. Some people prefer to save the very raw version of what the client sent, I don't. I have dealt with huge contests with a lot of end-users with all different information and way to input their data, whenever I had a problem, I was able to find the source and resolve it. – ascx Apr 28 '16 at 12:26

3 Answers3

2

There are (from a security POV) three types of data that you might output into HTML:

  • Text
  • Trusted HTML
  • Untrusted HTML

(Note that HTML attributes and certain elements are special cases, e.g. onclick attributes expect HTML encoded JavaScript so your data needs to be HTML safe and JS safe).

If it is text, then use htmlspecialchars to convert it to HTML.

If it is trusted HTML, then just output it.

If it is untrusted HTML then you need to sanitise it to make it safe. That generally means parsing it with a DOM parser, and then removing all elements and attributes that do not appear on a whitelist as safe (some attributes may be special cased to be filtered rather than stripped), and then converting the DOM back to HTML. Tools like HTML Purifier exist to do this.

$course->homepage = $_POST['homepage']; // may contain unsafe HTML

I would like that $course->homepage be treated and rendered as HTML by the browser.

Then you have the third case and need to filter the HTML.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
2

It looks like you're storing raw html in the database and then rendering it to the page later.

I wouldn't filter the data before you store it into the db, you risk corrupting the users input and there would be no way to retrieve the original if it were never stored.

If you want the outputted data to be treated as html by the browser then no, htmlspecialchars is not the solution.

However it is worth thinking about using striptags to remove script tags in order to combat XSS. With striptags you have to whitelist the allowable tags which is obviously tedious but pretty safe.

It might also be worth you taking a look at tinyMCE and see how they deal with such things

andrew
  • 9,313
  • 7
  • 30
  • 61
1

output plain HTML if you are sure about the contents. use htmlspecialchars on every other resources, especially for user inputs to prevent security issues.

chorn
  • 150
  • 2