3

I need to decode html in javascript. e.g.:

var str = 'apple & banana';
var strDecoded = htmlDecode(str); // I expect 'apple & banana'

There is no guarantee that the given str is already encoded and common jquery and DOM tricks are XSS vulnerable:

var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;'; // if you see 1 alerted, it means it is XSS vulnerable
var strDecoded; // I wish to get: &</textarea><img src=x onerror=alert(1)>ハローワールド

strDecoded = $('<div/>').html(attackStr).text(); // vulnerable in all browsers

strDecoded = $('<textarea/>').html(attackStr).text(); // vulnerable in ie 9 and firefox


var dv = document.createElement('div');
dv.innerHTML = attackStr; // vulnerable in all browsers
strDecoded = dv.innerText;

var ta = document.createElement('textarea');
ta.innerHTML = attackStr; // vulnerable in ie 9 and firefox
strDecoded = ta.value;

Is there any XSS-safe way to html-decode?

daghan
  • 948
  • 10
  • 18
  • What is it that you are trying to accomplish, exactly? The code that you show doesn't do HTML decoding at all, but HTML parsing. – Guffa Nov 03 '14 at 09:42
  • Use innerText or jQuery .text() method instead of innerHTML/.html() – Alex Nov 03 '14 at 09:46
  • hopefully clarified the question – daghan Nov 03 '14 at 10:11
  • @daghan, how are you Obtaining the string that might be malicious? That could point the way for a best Answer. – vernonner3voltazim Nov 03 '14 at 14:29
  • @vernonner3voltazim, it is user input which sometimes comes encoded sometimes unencoded – daghan Nov 03 '14 at 15:17
  • If it is user input via a textbox or equivalent, then you can completely control what gets entered, thereby preventing much need to "decode" it afterward. See http://stackoverflow.com/questions/25842070/how-to-type-mixing-of-caps-and-small-letter-in-same-textbox-if-i-set-default-up/25843571#25843571, and especially the last part of my Answer there. – vernonner3voltazim Nov 03 '14 at 17:04
  • @vernonner3voltazim Our business requirement: We do need a decoder to run with ANY string without executing javascripts in input. We have no luxury to sanitize what goes in. – daghan Nov 04 '14 at 09:35
  • @daghan, then I can't offer anything better. You may have to write one from scratch. But I suspect, as you do that, you will discover some things that malicious code-strings have in common, that you could more-efficiently block at the input stage (and not all business requirements stay set in stone when new relevant factors are discovered). Good luck! – vernonner3voltazim Nov 04 '14 at 09:53

6 Answers6

5

Taking a mix of your code and the highest-voted (not the accepted) answer at HTML Entity Decode, how about this:

var decodeEntities = (function() {
  // this prevents any overhead from creating the object each time
  var element = document.createElement('textarea');

  function decodeHTMLEntities (str) {
    if(str && typeof str === 'string') {
      str = str.replace(/</g,"&lt;");
      str = str.replace(/>/g,"&gt;");
      element.innerHTML = str;
      str = element.textContent;
      element.textContent = '';
    }

    return str;
  }

  return decodeHTMLEntities;
})();

Fiddle here: http://jsfiddle.net/ursu67z6/

You could also have a look at https://github.com/mathiasbynens/he maybe. I haven't gone through it myself, but it might deal with some cases better. I expect that if you are only decoding rather than encoding, the dom-based approach is better.

Community
  • 1
  • 1
Chris Lear
  • 6,592
  • 1
  • 18
  • 26
1

DOMPurify is a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. It's written in JavaScript and works in all modern browsers (Safari, Opera (15+), Internet Explorer (9+), Firefox and Chrome - as well as almost anything else using Blink or WebKit). It doesn't break on IE6 or other legacy browsers. It simply does nothing there.

DOMPurify is written by security people who have vast background in web attacks and XSS. Fear not.

I've tested and use DOMPurify and it's really good at sanitize untrusted data on client-side. Using is very simple.

Import the purify.js

<script type="text/javascript" src="purify.js"></script>

And call your untrusted variable.

var attackStr = '</textarea><img src=x onerror=alert(1)>'
var clean = DOMPurify.sanitize(attackStr );

Output will be like following.

<img src="x">

You can test your XSS payload at here https://cure53.de/purify

Source codes, examples and documentations are can be found over here ( https://github.com/cure53/DOMPurify )

Mehmet Ince
  • 1,298
  • 9
  • 20
  • I do not want to sanitize. I wish to keep the string as it is but just apply 1 level decode on it. Thus if my string is it should stay as is. if it is <, it should be < etc. thx for lib anyways – daghan Nov 06 '14 at 12:21
0

Here is a clean solution that does not imply to inject the HTML anywhere. Copy both these functions somewhere in your code: http://phpjs.org/functions/html_entity_decode/ and http://phpjs.org/functions/get_html_translation_table/

You'll have to remove "this" in "html_entity_decode" on line 26.

console.log( html_entity_decode('&amp;</textarea><img src=x onerror=alert(1)>') );
// &</textarea><img src=x onerror=alert(1)>

Cheers.

-- EDIT --

Your textarea trick looks good, did it cover all your use cases ?

The only other javascript solution I think about is to use a sandboxed, same-domain, iframe. It gives me good results but would only work in recent web browsers... I post the code in case.

function safeHtmlDecode(str, callback)
{
    var sameDomainBlankPage = document.location.href; // This should be a blank html page located on same domain
    $iframe = $('<iframe sandbox="allow-same-origin"/>').attr("src", sameDomainBlankPage);
    $iframe.on("load", function() {
        var body = $iframe.contents()[0].body;
        body.innerHTML = str;
        callback(body.innerText);
    });
    $("body").append($iframe);
}
$(document).ready(function(){
    var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;';
    safeHtmlDecode(attackStr, function(htmlString) {
        console.log( htmlString );
    });
});
Romain
  • 573
  • 3
  • 8
  • nice reference, but for input: &ハローワールド i need output: &ハローワールド and it does not work. get_html_transition_table is not comprehensive for everything – daghan Nov 09 '14 at 12:53
0

If you want to safely display the content.

Use innerText or jQuery.text() method instead of innerHTML/.html()
Ruchit Patel
  • 1,030
  • 1
  • 12
  • 24
0

The best I could get so far:

function htmlDecode(str){
    if(typeof str != "string") return str;
    str = str.replace(/</g,"&lt;");
    str = str.replace(/>/g,"&gt;");     
    var ta = document.createElement("textarea");
    ta.innerHTML = str;
    return ta.value;        
}

//test:
var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;';
alert(htmlDecode(attackStr)); // &</textarea><img src=x onerror=alert(1)>ハローワールド
daghan
  • 948
  • 10
  • 18
0

You can use jQuery function like below, to encode or decode the input String

function htmlEncode(value){
  return $('<div/>').text(value).html();
}

function htmlDecode(value){
  return $('<div/>').html(value).text();
}

htmlDecode('&lt;b&gt;test&lt;/b&gt;')
// result "<b>test</b>"

htmlDecode('test')
// result "test"

In this code

  1. I'm actually creating a Div which is not actually present on the page
  2. Passing input string to the htmlDecode function
  3. jQuery automatically encode/decode the string
  4. Returning the new html/text

Hope this helps!

Ashish Panchal
  • 486
  • 2
  • 8