We use jsp, servlets, beans with mysql database. We don't want to restrict the characters entered by users on form fields. So how do I sanitize the input and how to make sure the output is not changed for malicious activities. Is there way while sending the output I could check if extra code has been sent. Like suppose there is search input field -- the user gives something like <script>alert("I am here")</script>
. Is there anway I could know this is a html tag. If the user appends an extra parameter to a link field, is there like a before and after check I could do for the document to realize there has been a extra link field.

- 50,583
- 16
- 120
- 115
-
Please consider renaming your questions to soemthing like "How best to sanitize input in Java" - your question title won't help those looking for similar answers in future... – razlebe Apr 17 '09 at 18:21
-
13Whoa, when you go to retag the question, it executes the javascript in the question! Bad stackoverflow, bad! – Rob Hruska Apr 17 '09 at 18:22
3 Answers
Give jsoup
a go to help you out with this. Whatever you do, don't try to hack this up using regex or something, because then you'll have 2 problems. :-)
With jsoup
, all you need is a short snippet of code:
String safe = Jsoup.clean(unsafe, Whitelist.basic());
You can add tags and attributes to Whitelist
fairly easily, though I found it doesn't support namespace tags.

- 8,585
- 14
- 54
- 67

- 178
- 1
- 5
-
1Note that `Whitelist` has been deprecated (will be removed in 1.15.1) in favor of `Safelist` – Karl Galvez Aug 28 '21 at 15:13
You really should allow users to input as little HTML and/or javascript as possible. One good solution to validating and sanitizing this stuff is to use a ready-made library like OWASP AntiSamy.
Also, take a look at OWASP Enterprise Security API for a collection of security methods that a developer needs to build a secure web application.

- 4,094
- 3
- 30
- 37
You should always do basic HTML-escaping of data taken from sources like user input or the database that might contain invalid characters. The <c:out>
JSP tag does this, for example. That way if the user enters "<script> ..."
in a field and you are printing it back again, it will be printed to the HTML as "<script> ..."
.

- 127,052
- 24
- 157
- 134