0

In Python 3.6+, is there a succinct way to encode output for JavaScript contexts?

This means, if I start with any unsanitized input string, encode it properly, then replace VALUE below with it, all XSS attacks in the webpage will be prevented. The input won't be able to break out of the JavaScript string, nor the HTML.

<!DOCTYPE html>
<html>
  <head>
    <script>
      var a = 'VALUE';
    </script>
  </head>
</html>

The link I provided above is the official OWASP cheatsheet for XSS prevention, which states that all non-alphanumeric characters must be hex escaped. They provide a Java implementation in the article, but I have not been able to find a Python implementation except this one which has not been updated since 2010. So I wrote my own:

import curses.ascii

def as_js_in_html(value):
    result = ''
    for char in value:
        if curses.ascii.isalnum(char):
            result += char
        else:
            char_hex = format(ord(char), 'x')
            if len(char_hex) <= 2:
                result += '\\x' + char_hex.rjust(2, '0')
            elif len(char_hex) <= 4:
                result += '\\u' + char_hex.rjust(4, '0')
            else:
                result += '\\U' + char_hex.rjust(8, '0')
    return result

Is there a better way?

ti7
  • 16,375
  • 6
  • 40
  • 68
Joseph238
  • 1,174
  • 1
  • 14
  • 22
  • 1
    `json.dumps(value)` should do what you want. – Barmar Oct 28 '22 at 21:34
  • `json.dumps(value)` is awesome! Unfortunately I can still break out of the HTML though. `json.dumps("")` returns the same string, which breaks out of the HTML successfully, as you can see in this [JSFiddle](https://jsfiddle.net/joseph_white3/o7mrb6xw/) which displays an "XSS" alert. – Joseph238 Oct 28 '22 at 21:44
  • I always wondered why PHP's `json_encode()` escapes forward slashes. I guess that explains it (`<\/script>` won't break out). Unfortunately, it doesn't look like `json.dumps()` has a similar option. – Barmar Oct 28 '22 at 21:47

1 Answers1

0

One solution is to use a CDATA Section, which is valid in XML. A CDATA section is declared using <![CDATA[, continues until ]]>, and protects the input from breaking out of the HTML. This can simplify the code you have to write yourself. Here, VALUE is an XSS attack, which doesn't work because the input is within CDATA, so it can't break out of the HTML:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
<head>
  <script>
  // <![CDATA[
    var a = "</script><script>alert('XSS');</script>";
    console.log(a);
  // ]]>
  </script>
</head>
</html>

As explained on the MDN docs:

"When inside a CDATA section, the symbols < and & don't need escaping as they normally do" (see link). This means we don't have to HTML escape the input anymore.

"[A CDATA Section] will only work with XML, not HTML documents (as HTML documents do not support CDATA sections)" (see link).

XML documents must be served using Content-Type: application/xhtml+xml (see link). Otherwise, "browsers parse those documents using HTML parsers rather than XML parsers."

Caveats:

  1. You must serve the XML HTML document with Content-Type: application/xhtml+xml, or this won't work.

  2. You need to use something like json.dumps to prevent newlines and quotation marks from escaping out of the JavaScript variable.

  3. Any CDATA closing sequence, ]]>, must be removed from the unsanitized input or replaced.

  4. CDATA sections were previously made obsolete in the standard, then added back due to web breakage. So this technique may be outdated. Please comment if you have any info about whether this is considered good practice.

Joseph238
  • 1,174
  • 1
  • 14
  • 22