11

A user can enter HTML that will later be displayed to other users. The WYSIWYG plugin i'm using sanitizes the HTML from the front end. It removes all potentially malicious tags (script, src, anything starting with "on" etc) I obviously need to do some validation in the back end as well.

Does anyone know of a good solution for C#? I keep seeing this http://roberto.open-lab.com/2010/03/04/a-html-sanitizer-for-c/, though I'm a little hesitant to use some code from a random blog. Are there any well known plugins? What do most people do in this situation?

user1652427
  • 697
  • 2
  • 8
  • 21
  • What kind of sanitization are you looking for? Strip *all* html tags, or only some of them? – System Down Mar 04 '13 at 18:52
  • 5
    possible duplicate of [How to use C# to sanitize input on an html page?](http://stackoverflow.com/questions/188870/how-to-use-c-sharp-to-sanitize-input-on-an-html-page) – jrummell Mar 04 '13 at 18:52
  • Only some. Pretty much anything that could be used maliciously. – user1652427 Mar 04 '13 at 18:54

1 Answers1

7

You can use HtmlAgilityPack, which is a well maintained library for all things related to HTML tags. A best practice would be to implement a White List, which is a list of allowable tags. This SO question might be exactly what you need:

HTML Agility Pack strip tags NOT IN whitelist

carla
  • 1,970
  • 1
  • 31
  • 44
System Down
  • 6,192
  • 1
  • 30
  • 34
  • Sanitizing / whitelisting with HtmlAgilityPack is really hard because it is not too good with parsing escape sequences. It does not recignize them without semicolons while the browsers do (e.g. `:`.) https://anglesharp.github.io/ is doing it a lot better. – jakubiszon Sep 11 '18 at 14:44