How to unescape html in javascript?

Question

I'm working with a web service that will give me values like:

var text = "&lt;&lt;&lt;&amp;&amp;&amp;";

And i need to print this to look like "<<<&&&" with javascript.

But here's the catch: i can't use inner HTML(I'm actually sending this values to a prototype library that creates Text Nodes so it doesn't unescape my raw html string. If editing the library would not be an option, how would you unescape this html?

I need to undertand the real deal here, what's the risk of unescaping this type of strings? how does innerHTML does it? and what other options exist?

EDIT- The problem is not about using javascript normal escape/unescape or even jQuery/prototype implementations of them, but about the security issues that could come from using any of this... aka "They told me it was pretty insecure to use them"

(For those trying to undertand what the heck im talking about with innerHTML unescaping this weird string, check out this simple example:

<html>
<head>
<title>createTextNode example</title>

<script type="text/javascript">

var text = "&lt;&lt;&lt;&amp;&amp;&amp;";
function addTextNode(){
    var newtext = document.createTextNode(text);
    var para = document.getElementById("p1");
    para.appendChild(newtext);
}
function innerHTMLTest(){
    var para = document.getElementById("p1");
    para.innerHTML = text;
}
</script>
</head>

<body>
<div style="border: 1px solid red">
<p id="p1">First line of paragraph.<br /></p>
</div><br />

<button onclick="addTextNode();">add another textNode.</button>
<button onclick="innerHTMLTest();">test innerHTML.</button>

</body>
</html>

How can this question be a duplicate? This question is older than the question that is supposedly duplicated. — ands, Oct 24 '19 at 16:54
You can see security issues with using innerHTML in [answer to similar qestion](https://stackoverflow.com/a/1395954/6476044). To avoid XSS vulnerability you should use [he library](https://github.com/mathiasbynens/he). You can see code examples in another [answer to similar question](https://stackoverflow.com/a/23596964/6476044). — ands, Oct 24 '19 at 20:02

score 11 · Accepted Answer · edited May 23 '17 at 12:24

11

Change your test string to <b><<&&&</b> to get a better handle on what the risk is... (or better, <img src='http://www.spam.com/ASSETS/0EE75B480E5B450F807117E06219CDA6/spamReg.png' onload='alert(document.cookie);'> for cookie-stealing spam)

See the example at http://jsbin.com/uveme/139/ (based on your example, using prototype for the unescaping.) Try clicking the four different buttons to see the different effects. Only the last one is a security risk. (You can view/edit the source at http://jsbin.com/uveme/139/edit) The example doesn't actually steal your cookies...

If your text is coming from a known-safe source and is not based on any user input, then you are safe.
If you are using createTextNode to create a text node and appendChild to insert that unaltered node object directly into your document, you are safe.
Otherwise, you need to take appropriate measures to ensure that unsafe content can't make it to your viewer's browser.

Note: As pointed out by Ben Vinegar Using createTextNode is not a magic bullet: using it to escape the string, then using textContent or innerHTML to get the escaped text out and doing other stuff with it does not protect you in your subsequent uses. In particluar, the escapeHtml method in Peter Brown's answer below is insecure if used to populate attributes.

edited May 23 '17 at 12:24

Community

1
1

answered Jul 07 '09 at 05:16

Stobor

44,246
6
66
69

This is really useful. So, bottom line, if there's anything coming from a user, it should be TextNode?? – DFectuoso Jul 07 '09 at 06:49
@DFectuoso: That's one approach, which works if you don't want them to be able to use any HTML features. If, for example, you want them to be styling their text, you have to figure out how you do that safely... – Stobor Jul 07 '09 at 07:59
Interesting insight into security issues. – Milad Naseri Jan 12 '12 at 03:12
`If you are using createTextNode, you are safe` : NO, according to http://benv.ca/2012/10/2/you-are-probably-misusing-DOM-text-methods/ – user Sep 23 '14 at 06:27
@buffer: Ben quotes my answer out-of-context, which is a little sneaky. However, he is right about something else: using `createTextNode` to build an `escapeHtml` function may be insecure. While none of the answers on this page ever suggested doing that, my phrasing might have made others feel like functions elsewhere on the net which use `createTextNode` are safer than appropriate. I've added a clarification about that. – Stobor Sep 28 '14 at 11:19

PETER BROWN · Answer 2 · 2012-10-05T03:48:43.157

5

A very good read is http://benv.ca/2012/10/4/you-are-probably-misusing-DOM-text-methods/ which explains why the convention wisdom of using createTextNode is actually not secure at all.

A representative example take from the article above of the risk:

function escapeHtml(str) {
    var div = document.createElement('div');
    div.appendChild(document.createTextNode(str));
    return div.innerHTML;
};

var userWebsite = '" onmouseover="alert(\'derp\')" "';
var profileLink = '<a href="' + escapeHtml(userWebsite) + '">Bob</a>';
var div = document.getElementById('target');
div.innerHtml = profileLink;
// <a href="" onmouseover="alert('derp')" "">Bob</a>

edited Oct 05 '12 at 03:48

answered Oct 05 '12 at 03:43

PETER BROWN

550
6
14

1

It's not secure specifically in the use-case of building an `escapeHtml` method which is used to populate element attributes. However, his point stands: if you're not 100% sure of the context in which your function is being used, you can't be sure that this function is safe. The use of `createTextNode` properly in a construction like `document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str));` is not what he's commenting on... – Stobor Sep 28 '14 at 11:34

score 2 · Answer 3 · answered Jul 07 '09 at 02:13

2

Try escape and unescape functions available in Javascript

More details : http://www.w3schools.com/jsref/jsref_unescape.asp

answered Jul 07 '09 at 02:13

Anuraj

18,859
7
53
79

Im told that unescaping html with that method can lead to some serious security issues... that kind of my point.... – DFectuoso Jul 07 '09 at 02:14
4

No problem, i did it after you answered... dont down vote this guy! – DFectuoso Jul 07 '09 at 02:32
3

Escape and unescape functions are now deprecated. See, for example, [this blog entry](http://gotochriswest.com/blog/2011/05/23/escape-unescape-deprecated/) for details. – Ville Oct 02 '11 at 01:00
there are times were you might want to unescape your own code so there are no security issues there, anyway unescape does not always work, it does not unescape `<` for instance – raquelhortab Jul 27 '22 at 21:54
The above blog entry link has moved to here: https://cwestblog.com/2011/05/23/escape-unescape-deprecated/ – tresf Jan 12 '23 at 04:19
Note, if anyone uses the polyfill in my previous message, make sure to apply the fixes in the comments, they are required. – tresf Jan 12 '23 at 05:44

Fire Crow · Answer 4 · 2009-07-23T16:06:10.767

Some guesswork for what it's worth.

innerHTML is literally the browser interpretting hte html.

so < becomes the less than symbol becuase that's what would happen if you put < in the html document.

The largest security risk of strings with & is an eval statement, any JSON could make the application insecure. I'm no security expert but if strings remain strings than you should be ok.

This is another way innerHTML is secure the unescaped string is on it's way to becoming html, so theres no risk of it running the javascript.

score 1 · Answer 5 · answered Jul 07 '09 at 05:15

1

As long as your code is creating text nodes, the browser should NOT render anything harmful. In fact, if you inspect the generated text node's source using Firebug or the IE Dev Toolbar, you'll see that the browser is re-escaping the special characters.

give it a

"<script>"

and it re-escapes it to:

"&lt;script&gt;"

There are several types of nodes: Elements, Documents, Text, Attributes, etc.

The danger is when the browser interprets a string as containing script. The innerHTML property is susceptible to this problem, since it will instruct the browser to create Element nodes, one of which could be a script element, or have inline Javascript such as onmouseover handlers. Creating text nodes circumvents this problem.

answered Jul 07 '09 at 05:15

Jeff Meatball Yang

37,839
27
91
125

Although, I couldn't make it do anything bad with `<script>alert('hi');</script>` - for some reason although the script was inserted, it wasn't being run. But the onload for the images was, so I exploited that instead... – Stobor Jul 07 '09 at 05:19
@Stobor - could you show me what you mean? I'm curious... – Jeff Meatball Yang Jul 08 '09 at 03:36
@Jeff: It's been a while, but I only just saw your question. I meant I couldn't get the script on this page to run: http://jsbin.com/onezo - although viewing computed source shows the script tag, it doesn't `alert()`... The alert in my answer works, though. – Stobor Oct 15 '09 at 23:38

score 1 · Answer 6 · answered Aug 31 '11 at 17:10

1

function mailpage()
{ mail_str =  "mailto:?subject= Check out the " + escape( document.title ); 
      mail_str += "&body=" + escape("I thought you might be interested in the " + document.title + ".\n\n" );
      mail_str += escape("You can view it at " + location.href + ".\n\n");
      location.href = mail_str;
}

answered Aug 31 '11 at 17:10

Jan

11
1

The answer I just posted allows you to put the actual page title (with either & or &) in the subject line. ...and the body of the html page will show up in the body of the email. – Jan Aug 31 '11 at 17:12

How to unescape html in javascript?

6 Answers6

Linked