4

I want to provide an HTML editor on my site, but don't want to open myself up to xss or other attacks that come with allowing user-generated HTML.

This is pretty similar to what Stack Overflow does. How is the HTML checked/sanitized here so that the styling information still remains, while other, more dangerous stuff (like javascript, iframes, etc.) are kept out?

Are there any libraries (preferably in PHP) that already do this?

hakre
  • 193,403
  • 52
  • 435
  • 836
Dexter
  • 3,072
  • 5
  • 31
  • 32
  • Probably better asked at meta.stackoverflow.com – Andreas Wong Mar 24 '12 at 16:16
  • The easiest way is to use a list of known safe and allowed HTML tags, rather than trying to filter out the bad things. – Simeon Visser Mar 24 '12 at 16:17
  • 18
    @j08691, andreas: Just because the question uses SO as an example doesn't automatically make it a meta question. – BoltClock Mar 24 '12 at 16:18
  • 1
    @Simeon Visser yes but that still leaves the question open of how to actually check the html. BoltClock Yes, thank you. This is not about SO specifically, I just chose it as the most familiar example of what I'm trying to achieve. – Dexter Mar 24 '12 at 16:21
  • The general idea is to only allow some HTML tags (and some attributes). – hakre Mar 24 '12 at 17:28
  • HTMLPurifier will do the trick; http://htmlpurifier.org/ – Pieter Mar 24 '12 at 16:19

4 Answers4

3

PHP has a function strip_tags that strips HTML and PHP tags from a string, and allows you to specify certain allowable tags. But as @webarto states, there are libraries that do this better.

From the PHP Manual.

Whymarrh
  • 13,139
  • 14
  • 57
  • 108
  • 4
    That is not the solution... http://htmlpurifier.org/ – Dejan Marjanović Mar 24 '12 at 16:18
  • @webarto how to write for example `>` and `<` but make them visible in the result? – Roko C. Buljan Mar 24 '12 at 16:22
  • @RokoC.Buljan `htmlspecialchars` or `htmlentities`... – Dejan Marjanović Mar 24 '12 at 16:25
  • @Roko C. Buljan: Since the input is already in HTML the user (or the HTML editor on the page) typing in the content would have to take care of escaping things like "<". – Dexter Mar 24 '12 at 16:28
  • 1
    `strip_tags` alone doesn't answer the question correctly as it doesn't modify tag attributes. One can still abuse tag attributes if you use only this method. Make sure whatever solution you end up using sanitizes also attrs such as onmouseover, href, onclick... Best sanitization is no sanitization, if it suits your needs consider using a simple alternative such as markdown with safe mode on. – eymen Dec 11 '15 at 16:40
0

Your can use

strip_tags($yourData,"<a><p><div><i>") // more tags you want to keep;

If your using SQL too use

mysql_real_escape_string($data);

This is really all you need to not get injected. Do keep in mind, when using mySQL real escape you need to use strip slashes to remove them when you echo them out.

Here are the docs for strip tags and the docs for mysql escape.

Jordan Schnur
  • 1,225
  • 3
  • 15
  • 30
0

If you wish to allow some (X)HTML and restrict only tags viewed as unsafe, you can use something like KSES. Wordpress uses a solution like this.

http://sourceforge.net/projects/kses/

technosis
  • 57
  • 1
  • 9
-1

In addendum to Whymarrh's post, suggestion is to have the code work take place in a subfolder of your site, and auto-alter any code that has "..", or "http://" or any mysql commands.

lilHar
  • 1,735
  • 3
  • 21
  • 35