How to validate javascript and html code?

Question

In our application end users are provided with a textbox where they can paste their html or javascript code to create advertisement much like Google advertisement , so I need to validate these html and js source code against malicious code and also proper syntax . So are there any API's available in java to do the same ?

Thanks in advance Ali.

There isn't really any surefire way to validate raw HTML/Javascript. — Marie, Mar 02 '16 at 14:25
For html code validation Jtidy can be used but for Javascript code validation and malicious code I cant find any thing . — ali jawadi, Mar 02 '16 at 14:32
I agree with Marie: if you allow the user to enter _any_ HTML and JavaScript you'll have no idea of what that will do, especially the JavaScript. The script might load some malicious code from an external source or might access a valid one - and the user might trick you there by changing the source later on. You'd probably be better off allowing only a specific set of functions that you provide. — Thomas, Mar 02 '16 at 14:33
It's quite pointless to sanitize on the client-side, because the user can easily bypass your protections by using a proxy. You should _never_ trust user input sent to the server. — Brett, Mar 02 '16 at 14:35
But google also allow the users to create advertisement in these manner — ali jawadi, Mar 02 '16 at 14:42
@ali they've developed Caja to do this. See my answer. They don't do validation, they have a sandbox instead. — tucuxi, Mar 02 '16 at 15:39
Thanks @tucuxi and Brett for your answers and everybody for your valuable inputs , even Im now convinced that is very difficult to avoid malicious code in JS . — ali jawadi, Mar 03 '16 at 08:25

Brett · Answer 1 · 2016-03-02T14:37:49.213

0

You can find a lot of sources on the Internet if you search it. Here are a few: Java Encoder Project and Java HTML Sanitizer. I've never used them, but it's a starting point. You can learn a lot if you do the research yourself.

Edit: It's unclear if you're looking for a Java API or a JavaScript API. They're quite different.

edited Mar 02 '16 at 14:37

answered Mar 02 '16 at 14:28

Brett

4,268
1
13
28

score 0 · Accepted Answer · edited May 23 '17 at 11:59

Validating JS client-side is only useful to your nice users - since malicious users can bypass any client-side validation code anyway (by messing with the JS that is supposed to do the validation).

Validating JS server-side to look for "maliciousness" is, in the general sense, impossible unless you have a very restrictive white-list to check against. It is better to execute things in a sandbox that protects against bad things, and avoid validation (= checking for validity in advance of execution) altogether.

So, JavaScript Sandboxes. The most used is probably Google Caja - also protects against bad html/css. Sandboxing is not easy - in particular, Caja needs a server-side part to "cajole" the files and protect the host page; and any parts of the host-page outside the cajoled div need to be identified up-front.

See also some alternatives from another SO question. Note that many of them do not allow DOM access from protected code, and therefore not useful for JS that actually has to show things on screen.

How to validate javascript and html code?

2 Answers2