9

I know that lot of questions about HTML sanitizers have appeared in SO, but I don't know if they do what I want, I have a little mess since some of the recommended approaches have more than 4 years old.

I have a page with the TinyMCE editor. Of course, this editor send HTML to the server, and expect HTML, so I have created a entity with a String property decorated with the [AllowHtml] attribute. It works well.

Now, I want to ensure that nobody tries to send a <script> tag, or a <img onerror="">, or whatever way of execute JS, or add CSS that point to external urls.

What is the best solution at the moment?

WPL has the HtmlSanitizationLibrary, but how can I know what tags are considered "secure"?

WPL has not released anything from last April, and it was the beta. So I was wondering if is this project active?

Cheers.

vtortola
  • 34,709
  • 29
  • 161
  • 263
  • It's still active. The Sanitizer however, well, it's languishing these days. As folks have moved to XHTML the sanitizer isn't up to the job, and a rewrite isn't on the table. As TinyMCE does produce correct XHTML markup you can use Linq2Xml to query the DOM and sanitize to your hearts content. That's probably a better long term solution (and, hmm, gives me an idea for a blog or two) – blowdart Dec 30 '11 at 22:36

3 Answers3

4

AntiXss/WPL is now 'end-of-life'. Found this library in a reply elsewhere:

HtmlSanitizer, a .NET library for cleaning HTML fragments from constructs that can lead to XSS attacks.

Project site: https://github.com/mganss/HtmlSanitizer

Community
  • 1
  • 1
track0
  • 169
  • 2
  • 11
3

WPL is the de-facto standard. Run the string through it and you are safe to print it unencoded:

@Html.Raw(Model.SomePropertyThatWasSanitizedWithWPL)
Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • Yes, I use the Html.Raw method already. But my question is, which tags/attributes are striped down? – vtortola Dec 29 '11 at 13:42
  • @vtortola, the ` – Darin Dimitrov Dec 29 '11 at 13:43
  • The safe list isn't documented I'm afraid (outside of you delving into the code) – blowdart Dec 30 '11 at 22:37
  • 1
    Unfortunately this library seems to be useless in its current state. See the current [reviews](http://wpl.codeplex.com/releases/view/80289#ReviewsAnchor) – Trevor Elliott Oct 18 '13 at 17:23
  • @Trevor The newest review was for May 16 and the 4.3.0 version date is June 2 so are you sure about the useless of the last version? – QMaster Aug 11 '14 at 23:20
  • 3
    Considering WPL has reached end-of-life, it's not secure moving forward and was probably never secure to begin with. – Dan Bechard Dec 16 '16 at 18:56
1

Yo should probably go for a white list based HTML sanitizer which actually understands HTML documents. Using regular expressions is generally not considered to be a safe approach.

The reason for not using Microsoft's AntiXss is that it's not possible to enforce more detailed rules like what to do with tags. This results in tags being completely deleted when it for example would make sense to preserve the textual content. In addition it does not seem to be maintained anymore.

HtmlRuleSanitizer allows you to define a sanitation strategy to exactly match the expect HTML generated by your editor in the following manner:

var sanitizer = new HtmlSanitizer();
sanitizer.Tag("strong").RemoveEmpty();
sanitizer.Tag("b").Rename("strong").RemoveEmpty();
sanitizer.Tag("i").RemoveEmpty();
sanitizer.Tag("a").SetAttribute("target", "_blank")
    .SetAttribute("rel", "nofollow")
    .CheckAttribute("href", HtmlSanitizerCheckType.Url)
    .RemoveEmpty();

string cleanHtml = sanitizer.Sanitize(dirtyHtml);

Use a a predefined sanitation strategy.

Christ A
  • 519
  • 5
  • 5