1

Overflowed Stack,

I have a Java web application (tomcat) whereby I allow the user to upload HTML code through a form.

Now since I am running on tomcat and I actually display the user-uploaded HTML I do not want a user to malicious code JSP tags/scriptlet/EL and for these to be executed on the server. I want to filter out any JSP/non-HTML content.

Writing a parser myself seems too onerous - apart from the lots of subtleties one has to take care of (comments, byte representation for the scripts etc).

Do you know of any API/library which does this for me ? I know about Caja filtering, but am looking at something specifically for JSPs.

Many Thanks, JP, Malta.

MalteseUnderdog
  • 1,971
  • 5
  • 17
  • 17

6 Answers6

2

Don't worry about executing JSP code. Your JSP will be turned into a servlet once, so you will have something like:

out.println(contents);

and the contents won't be evaluated as JSP code. But you must worry about malicious javascript

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
2

Using a library for content cleaning is better than trying to do it yourself with e.g. Regexes.

Try Antisamy of the Open Web Application Security Project.

http://www.owasp.org/index.php/Antisamy

I didnt used it (yet), but seems to be suitable. JSP Content should be automatically removed/escaped by the HTML Normalization.

Edit, just found these:
Best Practice: User generated HTML cleaning
RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Markus Kull
  • 1,471
  • 13
  • 16
  • Interesting post. Would be even more interesting to compare to Google's Caja - which seems to be the defacto standard in this area. – MalteseUnderdog Aug 10 '10 at 09:51
  • Didnt know about Caja before, this is interesting. Seems to be especially suited for embedding 3rdparty widgets. – Markus Kull Aug 10 '10 at 11:02
2

Just save it as *.html, not as *.jsp, then it won't be passed through the JspServlet which does all the taglib/EL processing work. All taglibs/EL will end up plain (unparsed) in response.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Thanks Balus, but we cannot do that as we add JSP scriplets to the user uploaded content ourselves (so we need to render those). – MalteseUnderdog Aug 12 '10 at 07:24
0

I'm not sure if i have understand you question completly but if you whant to remove all content in suround with a "<%@ .. %>" you can replace it with regex.

String resultString = subjectString.replaceAll("(?sim)<%@ .*? %>", "");
Floyd
  • 1,898
  • 12
  • 20
  • That is too onerous to maintain - what about and a thousand other tags ? Note that the namespace might be renamed e.g. . This is why I am looking for a library to do that. – MalteseUnderdog Aug 10 '10 at 08:55
0

I don't have a library to remove JSP tags, but you can write a little one based on regexp that would :

  • delete all "<% %>" tags
  • delete all HTML tags that contains the ':' character (to avoid "" tags for example

I don't know whether all potential malicious java code is included with theses two filters but it is a good start...

Another solution, but a little more complicated : use a http proxy server (Apache httpd, Nginx, etc.), that will serve directly static resources (css, images, html pages) and forward to Tomcat only dynamic resources (JSP and .do actions for example). When a file is uploaded, you force the file extension to ".html". You are sure (thanks to the http proxy) that the file will not be interpreted by Tomcat.

Benoit Courtine
  • 7,014
  • 31
  • 42
0

If the pages supplied by the users aren't mentioned in the web.xml and you don't have a rule "anything that ends with *.jsp is a JSP" in web.xml, Tomcat won't try to compile/run them.

What is much more important: You must filter the HTML or users could add arbitrary JavaScript which would then steal other users passwords. This is non-trivial. Try to clean the code with JTidy to get XML and then remove all <script> tags, <link>, <object>, maybe even <img> (unless you make sure the URLs supplied are valid; some buggy browsers might run JavaScript if the image source is actually text/JavaScript, all CSS styles and make sure any href points to a safe URL. Don't forget <iframe> and <applet> and all the other things that might break your secure shell.

[EDIT] Thats should give you an idea where this is going to. In the end, you should do the reverse: Allow only a very small subset of HTML -- if at all. Most sites (like this one) use special markup for the formatting for two reasons:

  1. It's more simple for the user
  2. It's more secure
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • This is not exact. If you have a jsp page you do not need to list it in web.xml for it to run. The user uploaded content gets saved to a jsp page, after I do some processing on it. – MalteseUnderdog Aug 10 '10 at 09:53
  • So you **intentionally** create a new JSP? That sounds *very* dangerous to me. If someone proposed that to me, I'd say "No" or "It will take me half a year to get right." Don't do it if you can avoid it. – Aaron Digulla Aug 10 '10 at 12:25
  • Well, since the people who decided this obviously don't care about security in any way, why bother with filtering? – Aaron Digulla Aug 12 '10 at 09:50