12

I developed a web application, that permits my users to manage some aspects of a web site dynamically (yes, some kind of cms) in LAMP environment (debian, apache, php, mysql)

Well, for example, they create a news in their private area on my server, then this is published on their website via a cURL request (or by ajax).

The news is created with an WYSIWYG editor (fck at moment, probably tinyMCE in the next future).

So, i can't disallow the html tags, but how can i be safe? What kind of tags i MUST delete (javascripts?)? That in meaning to be server-safe.. but how to be 'legally' safe? If an user use my application to make xss, can i be have some legal troubles?

bdukes
  • 152,002
  • 23
  • 148
  • 175
Strae
  • 18,807
  • 29
  • 92
  • 131

10 Answers10

14

If you are using php, an excellent solution is to use HTMLPurifier. It has many options to filter out bad stuff, and as a side effect, guarantees well formed html output. I use it to view spam which can be a hostile environment.

DGM
  • 26,629
  • 7
  • 58
  • 79
  • I decided to take this way, plus some kind of personal steps. I must give the total freedom to my costumers to use html tags ('cos of the WYSIWYG editor), restricting only certain things.. i hope that keep it updated with the latest security doors wont be much problematic. – Strae Apr 01 '09 at 07:40
13

It doesn't really matter what you're looking to remove, someone will always find a way to get around it. As a reference take a look at this XSS Cheat Sheet.

As an example, how are you ever going to remove this valid XSS attack:

<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>

Your best option is only allow a subset of acceptable tags and remove anything else. This practice is know as White Listing and is the best method for preventing XSS (besides disallowing HTML.)

Also use the cheat sheet in your testing; fire as much as you can at your website and try to find some ways to perform XSS.

Gavin Miller
  • 43,168
  • 21
  • 122
  • 188
6

The general best strategy here is to whitelist specific tags and attributes that you deem safe, and escape/remove everything else. For example, a sensible whitelist might be <p>, <ul>, <ol>, <li>, <strong>, <em>, <pre>, <code>, <blockquote>, <cite>. Alternatively, consider human-friendly markup like Textile or Markdown that can be easily converted into safe HTML.

John Feminella
  • 303,634
  • 46
  • 339
  • 357
  • Can´t you still insert scripts in the allowed tags using a white-list? – jeroen Mar 31 '09 at 15:38
  • That depends on how you're escaping them. If you're describing something like "ipt ...", I'd first note that "" and it's also escaped/removed. – John Feminella Mar 31 '09 at 15:45
  • I was thinking more about the attributes, but I guess that depends if your white-list has any tags that need them, so you would have to allow them. If you allow attributes, you´d have to get rid of the whole onclick="", etc. range, but I guess that´s pretty obvious :) – jeroen Mar 31 '09 at 15:54
  • Oh, absolutely. You have to whitelist attributes separately, though, just like you whitelist each tag. (That's the price you pay for being explicit.) – John Feminella Mar 31 '09 at 16:18
2

Rather than allow HTML, you should have some other markup that can be converted to HTML. Trying to strip out rogue HTML from user input is nearly impossible, for example

<scr<script>ipt etc="...">

Removing from this will leave

<script etc="...">
cjk
  • 45,739
  • 9
  • 81
  • 112
  • Using a white list rather than a black list would solve this problem. – Gumbo Mar 31 '09 at 15:37
  • see the img tag answer in http://stackoverflow.com/questions/701580/how-can-i-allow-my-user-to-insert-html-code-without-risks-not-only-technical-r/701609#701609 – cjk Mar 31 '09 at 15:44
  • XSS is also possible through other markup languages, such as BBcode, so that doesn't really fix anything. A whitelist approach works pretty well. – troelskn Mar 31 '09 at 16:17
1

Kohana's security helper is pretty good. From what I remember, it was taken from a different project.

However I tested out

<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>

From LFSR Consulting's answer, and it escaped it correctly.

Community
  • 1
  • 1
alex
  • 479,566
  • 201
  • 878
  • 984
1

For a C# example of white list approach, which stackoverflow uses, you can look at this page.

Çağdaş Tekin
  • 16,592
  • 4
  • 49
  • 58
0

I use this php strip_tags function because i want user can post safely and i allow just few tags which can be used in post in this way nobody can hack your website through script injection so i think strip_tags is best option

Clich here for code for this php function

0

If it is too difficult removing the tags you could reject the whole html-data until the user enters a valid one. I would reject html if it contains the following tags:

frameset,frame,iframe,script,object,embed,applet.

Also tags which you want to disallow are: head (and sub-tags),body,html because you want to provide them by yourself and you do not want the user to manipulate your metadata.

But generally speaking, allowing the user to provide his own html code always imposes some security issues.

codymanix
  • 28,510
  • 21
  • 92
  • 151
0

You might want to consider, rather than allowing HTML at all, implementing some standin for HTML like BBCode or Markdown.

chaos
  • 122,029
  • 33
  • 303
  • 309
-2

code that I should have just copy/pasted instead of screenshotting

It is very good function in php you can use it

$string = strip_tags($_POST['comment'], "<b>");
Sammitch
  • 30,782
  • 7
  • 50
  • 77