1

I am implementing an XSS filter for my web application and also using the ESAPI encoder to sanitise the input.

The patterns I am using are as given below,

 // Script fragments
Pattern.compile("<script>(.*?)</script>", Pattern.CASE_INSENSITIVE),
// src='...'
Pattern.compile("src[\r\n]*=[\r\n]*\\\'(.*?)\\\'", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
Pattern.compile("src[\r\n]*=[\r\n]*\\\"(.*?)\\\"", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// lonely script tags
Pattern.compile("</script>", Pattern.CASE_INSENSITIVE),
Pattern.compile("<script(.*?)>", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// eval(...)
Pattern.compile("eval\\((.*?)\\)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// expression(...)
Pattern.compile("expression\\((.*?)\\)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// javascript:...
Pattern.compile("javascript:", Pattern.CASE_INSENSITIVE),
// vbscript:...
Pattern.compile("vbscript:", Pattern.CASE_INSENSITIVE),
// onload(...)=...
Pattern.compile("onload(.*?)=", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL)

But, still a few script are not getting filtered specially the one which are appended to a parameter like

url?sourceId=abx;alert('hello');

How do I handle these?

Andrea Ligios
  • 49,480
  • 26
  • 114
  • 243
Cool Techie
  • 756
  • 2
  • 18
  • 39
  • It's unclear what are the possible attack vectors with that input. Without knowing how you are actually handling the input and output, it's unreasonable to ask use to come up with or check your input sanitation strategy. – nhahtdh Jul 09 '15 at 08:20
  • @nhahtdh As mentioned I have a filter which helps any input which has or the patterns mentioned above. But this one specific case where the script is attached to a parameter doesn't get cleaned. – Cool Techie Jul 09 '15 at 09:15
  • 4
    Encode the output, not the input. (That includes outputting to the likes of SQL as well as HTML.) – Tom Hawtin - tackline Jul 09 '15 at 14:09
  • 3
    What @Tom said. Trying to fix injection problems at the input stage, especially using blacklisting, is a strategy that cannot ever be reliable. This is a waste of your time. – bobince Jul 09 '15 at 16:18
  • 1
    @CoolTechie with this approach, I will still be able to attack your application with an automated fuzzer and continuously find corner cases that you never handled. Input sanitization should be a defense-in-depth strategy applied *after* you have already ensured proper output escaping. – avgvstvs Oct 15 '15 at 17:27
  • 1
    You don't show how you are using those expressions either. Be careful that you are not doing just a nonrecursive removal since removing `ipt>` will still leave you with `` after the first pass. Most frameworks and languages already have built-in functions to encode and decode special characters for exactly this purpose. This is by no means a novel problem, so you should question whether you really need to reinvent the wheel or not. – S.C. Jul 28 '16 at 04:26

1 Answers1

4

This isn't the right approach. It's mathematically impossible to write a regex capable of correctly punting XSS. (Regex is "regular" but HTML and Javascript are both context-free grammars.)

You can however guarantee that when you switch contexts, (hand off a piece of data that is going to be interpreted) that the data is correctly escaped for that context switch. So, when sending data to a browser, escape it for HTML if its being handled as HTML or as Javascript if its being handled by javascript.

If you DO need to allow HTML/javascript into your application, then you'll want a web-application firewall or a framework like HDIV.

Community
  • 1
  • 1
avgvstvs
  • 6,196
  • 6
  • 43
  • 74
  • `It's mathematically impossible to write a regex capable of correctly punting XSS`. That is, assuming of course, that the regex engine *strictly* adheres to regular expressions. Most engines do not. – Rob Jun 21 '17 at 07:10
  • @Rob could you share an example of a regex implementation where the regular language has been transformed into a context-free grammar? – avgvstvs Jun 25 '17 at 20:57