Render encoded HTML Characters

Question

This is possibly an odd query,

As part of a JavaScript component on a page, I am displaying the titles of several blog posts pulled from WordPress. The WP site is maintained by a marketing team. Occasionally they use characters in the title which render fine on the WP site but in the process of scraping are encoded (& ' and - in particular). Although the data is coming from a trusted source we need to maintain some form of encoding to help prevent XSS however we would also like to see these characters rendered correctly. The JavaScript function decodeURIComponent() renders the characters fine but unfortunately leaves us open from a security perspective.

Has anyone encountered this sort of issue before? Any suggestions for libraries or approaches to get around this would be greatly appreciated.

An example of the sort of input we need to work with is as follows:

NN &#8211; Web &#8211; Site &#8211; Test Article &#038; stuff ’ &#038; &#8211;

what do you mean that it leaves you open from a security perspective? — Flash Thunder, Mar 06 '19 at 10:12
Not escaping special characters leaves a potential XSS vulnerability. So, for example if one of our marketing colleagues accidentally wrote a post entitled `NN Test Post 4 – 5th December ` this would execute — Jonny, Mar 06 '19 at 10:13
And what code have you tried to fix this very common issue? No libraries are needed BTW, but a small sample section of code would be a huge help. — Tigger, Mar 06 '19 at 10:13
The problem with your example is that it should and does simply render correctly, so it's not an example of the problem. -> https://jsfiddle.net/27mjLvda/ — Flash Thunder, Mar 06 '19 at 10:18
We haven't tried to fix this issue, as mentioned, the only workable option we've seen so far presents a security risk. — Jonny, Mar 06 '19 at 10:24
@Jonny What do you _want_ to happen when you get a post titled __"NN Test Post 4 – 5th December "__? From your post it sounds like it already displays the title correctly (with the special characters properly escaped), so what else do you want to happen? — Mr Lister, Mar 06 '19 at 10:44

score 0 · Answer 1 · answered Mar 06 '19 at 10:29

Referring to what has been answered in this post you can safely interpret the text by using the DOMParser.

Code Samples of the original answer (in case the link breaks):

var encodedStr = 'hello &amp; world';

var parser = new DOMParser;
var dom = parser.parseFromString(
   '<!doctype html><body>' + encodedStr,
   'text/html');
var decodedString = dom.body.textContent;

console.log(decodedString);

Also the linked post specifies, that JavaScript is not enabled in the DomParser so there will be no risk of code injected into your website.

score 0 · Answer 2 · answered Mar 06 '19 at 10:43

Without seeing real sample code it is impossible to confirm both your security concerns and render errors.

As I stated in a comment above, this is a common issue. I personally think the question should also be closed, but here is a possible fix sample.

Drop your XSS vulnerability string in, there is no issue.

var out = null;
function garbage(e) {
  if (out) {
    out.textContent = this.value;
  }
}
window.onload = function() {
  out = document.getElementById("out");
  var d = document.getElementById("in");
  if (d) {
    d.addEventListener("keyup",garbage,false);
  }
}

Garbage in:<br />
<input id="in" type="text" />
<p>
Garbage out:<br />
<div id="out"></div>

Render encoded HTML Characters

2 Answers2