I have a rich text area where the user can type something. I am trying to prevent JavaScript injection using the following regex:
return input == null ? null : input.replaceAll("(?i)<script.*?>.*?</script.*?>", "") // case 1
.replaceAll("(?i)<.*?javascript:.*?>.*?</.*?>", "") // case 2
.replaceAll("(?i)<.*?\\s+on.*?>.*?</.*?>", ""); // case 3
Above, input
is the text from the rich text area and I am using this regex to avoid possible JavaScript injections.
The problem is case 3. If the user's text contains "on"
, all the text before "on"
gets removed.
How can I make the last case more rigid and avoid the above problem?
. Did you think about escaping the html (including javascript) instead of removing it?
– Igor Deruga Jan 04 '17 at 19:35