0

This string: <a onclick="doit('&#39;')">...</a> is received from the server-side and needs to be set as the inner HTML of an element.

When I use Element#setInnerHTML the string is converted to <a onclick="doit(''')">...</a>, i.e. unescapes the HTML entity to the character it represents.

How can I preform the inner HTML assignment with no entity conversions?

Clarification: The inner HTML assignment only unescapes entities inside attribute values.

Eliran Malka
  • 15,821
  • 6
  • 77
  • 100
Ali Shakiba
  • 20,549
  • 18
  • 61
  • 88
  • try escaping the ampersand: `&` --> `&`, so the result is `...` before the HTML injection. – Eliran Malka Jul 18 '12 at 15:37
  • @EliranMalka thanks, it seems that setInnerHTML only unscape entities inside attributes. so when I apply your solution those inside attributes are corrected but those outside are broken instead (`&` outside attribute is converted to `&amp;` and is viewed as `&` instead of `&`) – Ali Shakiba Jul 18 '12 at 16:01
  • why not use escaping only for the cases in question (only for attribute values)? – Eliran Malka Jul 18 '12 at 17:08
  • because this html fragment is also included in html pages (as well as ajax load) and escaping needs to be consistent on server. – Ali Shakiba Jul 19 '12 at 04:04

1 Answers1

2

The setInnerHTML() implementation itself poses no issue, since its only role is to assign a property value to the underlying JS object, as can be seen by examining Element's source code:

public final native void setInnerHTML(String html) /*-{
    this.innerHTML = html || '';
}-*/;

The problem lies within the browser, innocently following the Charset Entity References guides in the HTML Document Representation specifications and parsing your entities, which are allowed (thus get parsed) inside attribute nodes.

From the specification:

Authors should also use "&amp;" in attribute values since character references are allowed within CDATA attribute values.

Solution

On the server side (or via a filter or a designated client proxy), escape all special characters inside attribute values with the corresponding HTML entities, e.g.:

<a onclick=\"doit('&amp;#39;')\">...</a>

References on W3C

Eliran Malka
  • 15,821
  • 6
  • 77
  • 100
  • 1. I think you have misunderstand the spec. It means that **literal `&`** should be escaped to `&` but `&` in `'` is not literal. 2. The word **also** in the sentence you have quoted from spec refers to previous sentence which generally states that literal `&` should be escaped and then adds also in attribute, so special treatment for attribute can not be concluded. 3. If I do so I will have broken html (incorrect attribute values) when served as html page instead of ajax response. – Ali Shakiba Jul 19 '12 at 03:50
  • **1.** as the html entity appears inside a string, the ampersand becomes literal. **2.** you can, of course, escape all typical cases (e.g. in text nodes), but you didn't mention having a problem in other situations. **3.** treat json type response differently, what's the problem? – Eliran Malka Jul 19 '12 at 06:20