Sanitize – kamilk Sep 27 '15 at 17:35

  • 1
    http://stackoverflow.com/questions/1659749/script-tag-in-javascript-string Here somebody encounter a problem resulting from this. It was 2009, maybe the browsers have become smarter since then, I don't know to be honest. One solution is given there. My preferred solution would be not passing data like this, but instead place it a `data` attribute or a hidden field, but I guess it doesn't answer the question. – kamilk Sep 27 '15 at 17:37
  • 2 Answers2

    1

    Edited for non-mutation of data.

    If I'm interpreting this correctly. You want to prevent the user from ending the script tag prematurely within the user submitted string. That can be done for html just as you stated with adding the backslash in with the ending tag <\/script>. That is the only escaping you should have to worry about in that case. You shouldn't need to escape html comments as the browser will interpret it as part of the javascript. Perhaps if some older browsers don't interpret script tags default to the type of text/javascript correctly (language="javascript" which is deprecated) adding in type='text/javascript' may be necessary.


    Based on Mike Samuel's answer here I may have been wrong about not needing to escape html comments. However I was not able to reproduce it in chrome or chromium.

    Community
    • 1
    • 1
    Cody Gustafson
    • 1,440
    • 11
    • 13
    • This would make the string safe, but it would also mutate my data. I don't necessarily want to do that; if a JS object in my payload has a string property with an `&` inside it, now if I resave the data, it becomes `&`, and then the next time I retrieve it, it becomes `&amp;`, and so on. The encoding process must not affect the integrity of the data. – Jackson Sep 27 '15 at 03:20
    • I don't want the data to ever be mutated. Unescaping data is also a form of mutation. – Jackson Sep 27 '15 at 03:45
    1

    Assuming that you're doing this:

    Payload is set to

    var data = '[this is user controlled data]';
    

    and the rest of the code (assignment, quotes and semi-colon) is generated by your application, then the encoding you want is hex entity encoding.

    See the OWASP XSS Prevention Cheat Sheet, Rule #3 for more information. This will convert

    </script><script>alert("Muahahaha!")
    

    into

    var data = '\x3c\x2fscript\x3e\x3cscript\x3ealert\x28\x22Muahahaha\x21\x22\x29';
    

    Try this and you will see this has the advantage of storing the user set string exactly correct, no matter what characters it contains. Additionally it takes care of single and double quote encoding. As a super bonus, it is also suitable for storing in HTML attributes:

    <a onclick="alert('[user data]');" />
    

    which normally would have to be HTML encoded again for correct display (because &amp; inside an HTML attribute is interpreted as &). However, hex entity encoding does not include any HTML characters with special meaning so you get two for the price of one.

    Update from comments

    The OP indicated that the server-side code would be generated in the form

    var data = <%= JSON.stringify(data) %>;
    

    The above still applies. It is upto the JSON class to properly hex entity encode values as they're inserted into the JSON. This cannot easily be done outside of the class as you'd have to effectively parse the JSON again to determine the current language context. I wouldn't recommend going for the simple option of escaping the forward slash in the </script> because there are other sequences that can end the grammar context such as CDATA closing tags. Escape properly and your code will be future proof and secure.

    Community
    • 1
    • 1
    SilverlightFox
    • 32,436
    • 11
    • 76
    • 145
    • The original post implies a violation of rule #0, since this appears to be direct injection into script tags. Otherwise, your answer makes sense to me. – Gray Sep 28 '15 at 14:18
    • Agreed. My answer was under the assumption that the `var data = '` bit was generated by trusted code and it was only the string literal here that was user controlled. – SilverlightFox Sep 28 '15 at 14:20
    • Apologies, the example in the OP was meant to be as simple as possible. My data is actually stringified JSON, e.g. `var data = {"foo": ""};`, where the template is `var data = <%= JSON.stringify(data) %>;`. – Jackson Sep 28 '15 at 17:30