19

When you can simply encode the data using HttpUtility.HtmlEncode, why should we use AntiXss.HtmlEncode?

Why is white list approach better than black listing?

Also, in the Anti XSS library, where do I specify the whitelist?

Nick
  • 7,475
  • 18
  • 77
  • 128

4 Answers4

20

You can't specify or alter the white list with the AntiXSS library, which is not strange when you think about it. The AntiXSS library by default encodes all characters that are not in the following range: 0..9a..zA..Z. This set of characters is safe (and therefore are on the white list) and there's no need in encoding them. Please note that the AntiXSS library has different lists for encoding javascript, html and url’s. Please don’t use html encode for url’s, because you’ll have a security hole in your application.

Please note that the white list on HtmlEncode works different than the white list on GetSafeHtmlFragment. With HtmlEncode you say 'please encode every character that's not on the white list', with GetSafeHtmlFragment you say 'please remove all tags and attributes that are not on the white list'.

When you're using ASP.NET 4.0 I'd advice you not to use the AntiXSS library (directly), but simply use the built in mechanisms (such as HttpUtility) to encode Html. ASP.NET 4.0 allows you to configure a HttpEncoder in the configuration file. You can write your own HttpEncoder that uses the AntiXSS library (it’s likely that a future version of the AntiXSS library will contain a HttpEncoder implementation). By doing this, your whole application (and all ASP.NET controls and custom controls) will use white list encoding instead of black list encoding.

ASP.NET 4.0 also introduces a new code block for encoded text. You can use First Name: <%: Model.FirstName %>. However, I personally find <%= HttpUtility.HtmlEncode(Model.FirstName) %> more explicit.

Steven
  • 166,672
  • 24
  • 332
  • 435
  • 5
    Actually the range of characters "left alone" by AntiXSS is larger than the ASCII alphabet. We support a few of the more common Unicode/UTF language characters and are looking to expand it to full Unicode 5 support. It's a good point about the HttpEncoder, we're having a sprint meeting today, I'll add that to the features list, as I don't remember seeing it in the codebase (but I've only joined two weeks ago) – blowdart Feb 24 '10 at 13:59
  • I see you're a "Microsoft employee working AntiXSS", but I had to check using Reflector ;-). Of course you're right: The white list is bigger than I thought. Thanks for bringing this up. It would be great if a HttpEncoder implementation could be added to the library, but I couldn't imagine it not be on your list already. Nice to be part of the process ;-) – Steven Feb 24 '10 at 14:35
  • Well it went on the list once you mentioned it. I've only been here a month, and we're doing the first sprint planning for it next week now it's mine, all mine, muhahahahahaha. – blowdart Mar 05 '10 at 17:14
  • @blowdart why AntiXss not escaping like \x3c etc? It should escape based on in the middle of this video http://channel9.msdn.com/Events/MIX/MIX10/FT05. I test Microsoft.Security.Application.Encoder.HtmlEncode() but could'nt make it working. – CallMeLaNN Mar 31 '11 at 06:40
  • @blowdart: Why did you choose the most unfortunate name of `Encoder` as main type of the AntiXss library. There are already 2 types in the BCL with that name. – Steven Mar 31 '11 at 07:32
  • @Steven because that's what it is and does *shrug* – blowdart Mar 31 '11 at 12:55
  • 1
    @blowdart: It's probably not your error, but someone must have been sleeping. The Framework Design Guidelines state that you should prevent naming collisions in public types. There already was a naming collision (because there are already 2 types of that) but your team made it worse. The guideline is there for a reason. The problem is that this design has the opposite effect of what you intended: it makes code less readable, because we must sometimes fully qualify the `Encoder` type name. – Steven Mar 31 '11 at 15:29
  • 1
    It should also be noted here that `GetSafeHtmlFragment` was vulnerable and the new version is just useless, as you yourself wrote at your good answer which also includes a 'workaround' [in this post](http://stackoverflow.com/questions/12554194/how-to-properly-sanitize-content-with-antixss-library). – BornToCode Jan 19 '15 at 17:19
9

White lists are always more secure that blacklist - just think which will be more secure, having a list of all of the people who are not allowed to your party or only allowing in those who are. (Basically blacklists can only handle attacks which are obvious or have been used before).

ternaryOperator
  • 833
  • 4
  • 10
  • So, If i have untrusted input from a user that is stored in a datastore and displayed to a user at a later time, do I encode only when I display the data? (Or when I save the data, do I encode it as well)? – Nick Jan 07 '10 at 17:54
  • 1
    Well generally you should use specific checks before putting stuff into the database (in case of SQL injection) and before display (in case of XSS (e.g. javascript)). Generally a best practice is to specifically name all variables containing any data from the user (e.g. usrName or usrEmail) to prevent accidentally executing user input. – ternaryOperator Jan 07 '10 at 18:21
  • 1
    It is sad this got accepted since it is wrong in every way. Everyone please read the other answers which will actually help you make your website secure. – Hogan Oct 07 '11 at 21:38
  • 1
    ternaryOperator's answer isn't wrong in any way. It's just incomplete. 1. Whitelists are better than blacklists. True. 2. Check data before adding it to a database/datastore. True. 3. Check data before display. True. 4. Isolate untrusted data. True. 5. Don't execute untrusted input. True. – Douglas Held Feb 17 '12 at 17:21
  • It's important to do a few key things with data input and output. First, validate every piece of data that enters the program, i.e. all data that wasn't created by the programmer. so `public string getSafeLang( string usrLang ){//switch case through known languages; case else: return safe default lang;}` Second: put all your validation functions in a common library. Third: Use prepared statements; don't allow string concatenation in SQL. Fourth: Use correct output encoding for all web response. Microsoft AntiXSS is a great way to do this. If you do these 4, you are 50%+ there. – Douglas Held Feb 17 '12 at 17:32
  • Next: when you get some money together for a visit by a real security expert, have him or her review all your validator functions. – Douglas Held Feb 17 '12 at 17:37
1

The AntiXss library also includes Encode methods for things like Javascript or attributes.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
0

I tried implementing AntiXss library and it did work well to remove script tag. But failed to do so with HTML. See example below

<a href="http://west-wind.com">West Wind</a><br>Hello<br>Please login with the form below before proceeding:<form action="”mybadsite.aspx”"><table><tbody><tr><td>Login:</td><td><input type="text" name="x_x_x_x_x_x_x_x_x_x_x_x_x_login"></td></tr><tr><td>Password:</td><td><input type="text" name="x_x_x_x_x_x_x_x_x_x_x_x_x_password"> </td></tr></tbody></table><input type="submit" value="LOGIN"></form>