3

I'm using a textarea to get input from the user and display it on the screen. How can I make sure that if they put in something like

<h1>YAY, I hacked in</h1>

I only display it as it is, and it doesn't display as an <h1>. There must be a function for this. Help? :D

taevanbat
  • 425
  • 1
  • 8
  • 17
  • 1
    Check the following question: http://stackoverflow.com/questions/129677/whats-the-best-method-for-sanitizing-user-input-with-php/130323#130323 – Kristian Vitozev May 28 '13 at 14:00
  • 1
    Use a `XML Parser` on your server and strip / validate the input. **You don't use RegEx, do you!?** – jAndy May 28 '13 at 14:02
  • 1
    Create a text node, set its value as the user's input, and then append it to the page – Ian May 28 '13 at 14:02
  • 1
    possible duplicate of [What are the common defenses against XSS?](http://stackoverflow.com/questions/3129899/what-are-the-common-defenses-against-xss) – Quentin May 28 '13 at 14:04
  • 1
    be careful: sanitising/validating in the browser can be bypassed fairly easily if someone wants to hack you. You must also do similar checks in your server-side code as well. – Spudley May 28 '13 at 14:24
  • What server side technology are you using? – Martin Smith May 28 '13 at 14:38

2 Answers2

2

As I commented, if you're about to send that data to a server, you should use one of the various XML Parsers available and strip + validate the input.

If you however, need to purely validate on the client, I suggest you use document.implementation.createHTMLDocument, which creates an fully fledged DOM Object on the stack. You can then operate in there and return your validated data.

Example:

function validate( input ) {
    var doc   = document.implementation.createHTMLDocument( "validate" );

    doc.body.innerHTML = input;

    return [].map.call( doc.body.querySelectorAll( '*' ), function( node ) {
        return node.textContent;
    }).join('') || doc.body.textContent;
}

call it like

validate( "<script>EVIL!</script>" );
Florian Margaine
  • 58,730
  • 15
  • 91
  • 116
jAndy
  • 231,737
  • 57
  • 305
  • 359
  • How is using `document.implementation.createHTMLDocument` better than using a plain DOM element or a document fragment? – Florian Margaine May 28 '13 at 14:22
  • @FlorianMargaine its in fact very similar to a document fragment. However you can use anything in here, that you would do in your default document. You can literally load entire HTML documents into this thing and operate on it. Should be way more lightweight than an ` – jAndy May 28 '13 at 14:25
1

You need to address this on the server side. If you filter with JavaScript at form submission time, the user can subvert your filter by creating their own page, using telnet, by disabling JavaScript, using the Chrome/FF/IE console, etc. And if you filter at display time, you haven't mitigated anything, you've only moved the breakin-point around on the page.

In PHP, for instance, if you wish to just dump the raw characters out with none of the user's formatting, you can use:

print htmlentities($user_submitted_data, ENT_NOQUOTES, 'utf-8');

In .NET:

someControl.innerHTML = Server.HtmlEncode(userSubmittedData);

If you're trying to sanitize the content client-side for immediate/preview display, this should be sufficient:

out.innerHTML = user_data.replace(/</g, "&lt;").replace(/>/g, "&gt;");
svidgen
  • 13,744
  • 4
  • 33
  • 58
  • Bear in mind, the last suggestion doesn't sanitize the text for sending to other visitors. It's only legitimate purpose is for giving the text-entering user an accurate pre-submission preview of their entry. – svidgen May 28 '13 at 16:04
  • Okay. If you're using a PHP form and submitting the information via GET, mysql_real_escape_string would be a legitimate way to sanitize the string, right? – taevanbat May 29 '13 at 09:05
  • It's a legitimate way to sanitize a string for interpolation in a [My]SQL query. You still need to perform HTML/JavaScript sanitization on inserted values before sending them to the client. – svidgen May 29 '13 at 13:39