1

I want to allow my users to input HTML.

Requirements

  1. Allow a specific set of HTML tags.
  2. Preserve characters (do not encode ã into ã, for example)

Existing options

  1. AntiSamy. Unfortunately AntiSamy encodes special characters and breaks requirement 2.
  2. Native ColdFusion functions (HTMLCodeFormat() etc...) don't work as they encode HTML into entities, and thus fail requirement 1.
  3. I found this set of functions somewhere, but I have no way of telling how secure this is: http://pastie.org/2072867

So what are my options? Are there existing libraries for this?

Mohamad
  • 34,731
  • 32
  • 140
  • 219
  • What not go with AntiSamy and figure out how to convert HTML entities back? See this question's answer for doing this with Java using an Apache Commons library: http://stackoverflow.com/questions/994331/java-how-to-decode-html-character-entities-in-java-like-httputility-htmldecode – orangepips Jun 15 '11 at 15:57
  • 1
    @orangepips, because I read here that this could be potentially insecure: http://stackoverflow.com/questions/3246739/how-to-not-transform-special-characters-to-html-entities-with-owasp-antisamy/4052924#4052924 - not sure if that's true! – Mohamad Jun 15 '11 at 16:09
  • @Mohammad, interesting. Think you could turn around and use `HTMLEditFormat(cleanUpString)` to escape the example cited in that question. – orangepips Jun 15 '11 at 16:15
  • @orangepips, supposing I wanted to use your example, this fails: createObject("java", " StringEscapeUtils"); ... – Mohamad Jun 15 '11 at 16:15
  • @Mohammad: you need to download the jars, install them in ColdFusion's classpath, and restart ColdFusion to make the classes available. – orangepips Jun 15 '11 at 16:17
  • @orangepips, I'm sorry, I didn't understand what you meant. User HTMLEditFormat() before using antiSamy? – Mohamad Jun 15 '11 at 16:17
  • @orangepips, I'm thinking of going with your suggestion... I already loaded it through javaLoader and got the object to load... can you think of a reason why this is not safe? To be honest, that's beyond me! – Mohamad Jun 15 '11 at 16:32
  • @Mohamad: I'm sure there's a gap in there somewhere, but I think an attacker will need to be determined to find it. – orangepips Jun 15 '11 at 16:33
  • @orangepips, well, I got it to work. It just seems like a pain in the neck. 1. Send markdown to antiSami, 2. Re-encode characters in the markdown, 3. Parse markdown into HTML, etc... a lot of steps.. not sure if approach is scalable! – Mohamad Jun 15 '11 at 16:40
  • 1
    @Mohamad: RE: scalable. Load test. – orangepips Jun 15 '11 at 16:48
  • Is there an option in AntiSamy that turns off encoding of special characters? – Henry Jun 15 '11 at 16:53
  • @Henry, unfortunately, no, there is not. There is a case open for it (http://code.google.com/p/owaspantisamy/issues/detail?id=99), but the project owners refuse to implement it. There is another case you can add directives for white-listed characters (http://code.google.com/p/owaspantisamy/issues/detail?id=101), but no word on when it would be implemented. – Mohamad Jun 15 '11 at 17:00

1 Answers1

2

Portcullis works well for Cold Fusion for attack-specific issues. I've used a couple of other regex solutions I found on the web over time that have worked well, though they haven't been nearly as fleshed out. In 15 years (10 as a CMS developer) nothing I've built has been hacked....knock on wood.

When developing input fields of any type, it's good to look at the problem from different angles. You've got the UI side, which includes both usability and client-side validation. Yes, it can be bypassed, but javascript-based validation is quicker, more responsive, and rates higher on the magical UI scale than backend-interruption method or simply making things "disappear" without warning. It will speed up the back-end validation because it does the initial screening. So, it's not an "instead of" but an "in-addition to" type solution that can't be ignored.

Also on the UI front, giving your users a good quality editor also can make a huge difference in the process. My personal favorite is CKeditor simply because it's the only one that can handle Microsoft Word code on the front-side, keeping it far away from my DB. It seems silly, but Word HTML is valid, so it won't setoff any red flags....but on a moderately sized document it will quickly overload a DB field insert max, believe it or not. Not only will a good editor reduce the amount of silly HTML that comes in, but it will also just make things faster for the user....win/win.

I personally encode and decode my characters...it's always just worked well so I've never changed practice.

bpeterson76
  • 12,918
  • 5
  • 49
  • 82