Why use a whitelist for HTML sanitizing?

Question

I've often wondered -- why use a whitelist as opposed to a blacklist when sanitizing HTML input?

How many sneaky HTML tricks are there to open XSS vulnerabilities? Obviously script tags and frames are not allowed, and a whitelist would be used on the fields in HTML elements, but why disallow most of everything?

score 24 · Accepted Answer · answered Mar 19 '10 at 08:14

24

If you leave something off a whitelist, then you just break something that wasn't important enough for you to think about in the first place.

If you leave something off a blacklist, then you've opened a big security hole.

If browsers add new features, then your blacklist becomes out of date.

answered Mar 19 '10 at 08:14

Quentin

914,110
126
1,211
1,335

Ah -- the 'less room for human error' aspect had occurred to me (and of course I use a whitelist), I'm just curious about how fundamental this part of security really is – Carson Myers Mar 19 '10 at 08:18
5

@Carson: The fundamental part is the "if browsers add new features". There's just *no* way for you to predict this. One might also argue that catering to human error is *the* fundamental idea of security in general, hence the "if you leave something off a whitelist". – sleske Mar 19 '10 at 08:33
3

Also, even if you keep on top of new browser features, there's the problem of undocumented features (see e.g. Ikke's answer), which might bite you. – sleske Mar 19 '10 at 08:33

score 5 · Answer 2 · answered Mar 19 '10 at 08:12

Just read something about that yesterday. It's in the manual of feedparser.

A snippet:

The more I investigate, the more cases I find where Internet Explorer for Windows will treat seemingly innocuous markup as code and blithely execute it. This is why Universal Feed Parser uses a whitelist and not a blacklist. I am reasonably confident that none of the elements or attributes on the whitelist are security risks. I am not at all confident about elements or attributes that I have not explicitly investigated. And I have no confidence at all in my ability to detect strings within attribute values that Internet Explorer for Windows will treat as executable code. I will not attempt to preserve “just the good styles”. All styles are stripped.

There is a serious risk if you only blacklist some elements, and forget an important one. When you whitelist some tags you know are secure, the risk is smaller in letting something in which can be abused.

A good point, although I can't think of anything that could cause security risks off the top of my head, that I don't already know about. Could you provide a resource to such seemingly innocent but somehow exploitable HTML elements? — Carson Myers, Mar 19 '10 at 08:17
@Carson - yes, but what about the new exploit that comes along tomorrow? If you've got a tight whitelist - no updates required. If you've got a blacklist in 30 applications - lots of updates — Damien_The_Unbeliever, Mar 19 '10 at 08:20

score 5 · Answer 3 · answered Mar 19 '10 at 08:26

5

Even though script tags and frame tags are not allowed, you still can put any tag like this

<test onmouseover=alert(/XSS/)>mouse over this</test>

and many browsers works.

answered Mar 19 '10 at 08:26

YOU

120,166
34
186
219

this is a good example, but of course with whitelisting or blacklisting, element fields like "onmouseover" and the like would be stripped from tags anyway – Carson Myers Mar 19 '10 at 08:39
True, I just wanted to mention about tags, onmouseover is just the one come out on my mind at the moment. – YOU Mar 19 '10 at 08:43

score 3 · Answer 4 · answered Mar 19 '10 at 08:15

Because then you are sure that you don't miss anything. By explicitly allowing some tags you have obviously more control about what is allowed.

Whitelists are used in most security related topics. Think about firewalls. The first rule is to block any (incoming) traffic and then only open ports that are supposed to be open. This makes it far more secure.

score 2 · Answer 5 · answered Mar 19 '10 at 08:13

2

Because other tags can break the layout of a page. Imagine what would happen if someone injects <style> tag. <object> tag is also dangerous.

answered Mar 19 '10 at 08:13

Pavel Nikolov

9,401
5
43
55

That's true, `` could do it also I suppose – Carson Myers Mar 19 '10 at 08:16
This doesn't really answer the question, a blacklist could stop those tags too. – Andy E Mar 19 '10 at 08:17
@Andy it could but I think this also adds to the point that there's _so many things_ to consider that it's far too easy to write something off as safe. Obviously style tags would be disallowed, but to be honest I might have forgotten that ` – Carson Myers Mar 19 '10 at 08:20

Basil Musa · Answer 6 · 2015-12-02T16:02:42.793

I prefer to have both, I call it the "Black List with Relaxed White List" approach:

Create a relaxed "White List" of tags & attributes.
Create a "Black List for the White List", any tag/attribute in the black list SHOULD exist in the White List you created or else an error shows up.

This black list acts as an on-off switch for tags/attributes in the relaxed white list.

This "Black List with Relaxed White List" approach makes it much easier to configure the sanitizing filter.

As an example, the White List can contain all html5 tags and attributes. While the Black List can contain tags & attributes to be excluded.

score 0 · Answer 7 · answered Jan 12 '16 at 19:57

The more you allow, the more tricks that a left for clever hackers to inject some nasty code into your webpage. That's why you want to allow as little as possible.

See Ruben van Vreeland's lecture How We Hacked LinkedIn & What Happened Next for a good introduction to XSS vulnerabilities and why you want your whitelist to be as strict as possible!

Why use a whitelist for HTML sanitizing?

7 Answers7

Linked