9

Html entities must be encoded in alt attribute of an image in HTML page. So

<img id="formula" alt="A &rarr; B" src="formula.png" />

will work well.

On the other hand, the same JavaScript code will not work

document.getElementById('formula').alt = 'A &rarr; B';

and will produce A &rarr; B instead of A → B.

How to do it through JavaScript, when it is not possible to put the special (unencoded) characters in the source code?

Community
  • 1
  • 1
Arseni Mourzenko
  • 50,338
  • 35
  • 112
  • 199
  • Google peoples - see also: https://stackoverflow.com/questions/94037/convert-character-to-ascii-code-in-javascript – Andrew Jun 21 '17 at 20:57

4 Answers4

8

JavaScript has its own system for escaping special characters in strings:

document.getElementById('formula').alt = 'A \u2192 B';
s4y
  • 50,525
  • 12
  • 70
  • 98
  • 1
    +1. When you are assigning DOM properties, you **don't need HTML-encoding**. If you want the character `→` just paste it in; it's perfectly valid to say `img.alt= 'A → B';`. That only requires you to get the encoding you're saving your page as to match what you're serving it as (best: use UTF-8 for both). The JavaScript string literal escape `'A \u2192 B'` is a good fallback if you can't rely on non-ASCII characters being served properly. – bobince May 05 '10 at 19:16
  • @bobince: That was my first answer, too, but the question explicitly states: "How to do it through JavaScript, when it is not possible to put the special (unencoded) characters in the source code?" Moreover, this solution presupposes the source is a Unicode value, but the question's example uses an HTML entity. Oh, and really, no offence meant, I'm just wondering what to do in such a situation. – Marcel Korpel May 05 '10 at 22:45
  • Yeah, my feeling is there's a common assumption that non-ASCII characters need escaping which is much less often true than people think. (Indeed, the phrase 'special characters' is itself a misnomer when we're talking about 99.9% of all characters...) In any case there will be no difference between setting `innerHTML` to `rarr;` or `→` because all HTML parsers replace entity references with their text equivalent: you never get a DOM EntityReference node. – bobince May 06 '10 at 14:06
  • @bobince: I think it would be great if authors could comfortably include any character in a document, but I've seen way too many encoding fails to recommend it, and I'd only do it myself if I had control over the website, server, VCS, and editors used by the developers. "Special characters" means every character that's at risk for being broken by bad encoding, everything but the ASCII 95. – s4y May 06 '10 at 15:57
1
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>Decode HTML entities using JavaScript</title>
  </head>
  <body>
    <img id="formula" src="vote-arrow-up-on.png" alt="A &rarr; C">
    <script>
      function html_entity_decode(str) {
        var p = document.createElement("p");
        p.innerHTML = str.replace(/</g,"&lt;").replace(/>/g,"&gt;");
        return p.innerHTML;
      }

      var theValue = html_entity_decode('A &rarr; B');
      document.getElementById('formula').title = theValue;
    </script>
  </body>
</html>

Credits to The JavaScript Source.

EDIT: I changed the original code to only use innerHTML on a <p> element, instead of innerHTML and value of a <textarea>.

Marcel Korpel
  • 21,536
  • 6
  • 60
  • 80
  • @jweyrich: it's easier to see the effect using the `title` attribute ;) – Marcel Korpel May 05 '10 at 17:52
  • 1
    Setting the HTML of a `textarea` should *not* change its form field `value` according to the standard DOM; the contained HTML maps to the textarea's `defaultValue` property, not `value`. It's a browser quirk that `innerHTML` also affects `value`, and it doesn't happen in Opera. Don't rely on it. You could write `innerHTML` to any other element and read the text node child it ends up with (if non-blank string). Also, avoid `setAttribute`, which is unnecessary in HTML documents, and buggy in IE. – bobince May 05 '10 at 19:13
  • @bobince: FYI, I tested this example in Opera (10.10/Linux) and it worked fine. Also, why is `setAttribute` buggy in IE, in this case (and in cases you want to set a custom `data-…` attribute)? I can't find much about it, apart from [your post](http://stackoverflow.com/questions/748941/setattribute-onclick-and-cross-browser-compatability/749253#749253) explaining why setting event handlers using `setAttribute` doesn't work as expected (the same is true for e.g. the `style` object). If you want me to open a separate question about this, I'll be happy to do so. – Marcel Korpel May 11 '10 at 01:09
  • 1
    IE<8 treats `el.getAttribute(someattr)`/`el.setAttribute(someattr, ...)`) as being the same as property access, `el[someattr]`. Consequently, (1) any property whose value isn't a string will expose something that isn't a string. For integer properties you sometimes don't notice as weak typing hides it from you, but booleans, event handler functions and `style` will often trip you up. (2) any property whose name is different from the corresponding attribute will fail. That's `htmlFor`, `className`, and any attribute name made from multiple or hyphenated words. – bobince May 11 '10 at 22:54
  • 1
    (3) any property which has a different meaning to its attribute will behave strangely. For example `href` will return the full resolved URL in a link, not the original, possibly-relative version that was actually used in the attribute value, and `value` returns a form field's current value, and not the content of the `value` attribute (which actually maps to `defaultValue`). Also (4) custom-attributes can shadow real properties, so writing `
    ` might confuse a script into thinking it's a text node.
    – bobince May 11 '10 at 22:54
  • 1
    (This is another reason custom attributes should be avoided, but if you must use them then yes, you must use `getAttribute` exclusively to access them, and if you've got any sense you'll limit this use to attributes prefixed `data-`.) – bobince May 11 '10 at 22:55
1

Here's an alternative if you can't save your file using a Unicode format:

function decodeHTML(str) {
    return str.replace(/&#(\d+);?/g, function() {
        return String.fromCharCode(arguments[1])
    });
}

However, this requires you to use the numeric representation. In this case &#8594;.

jweyrich
  • 31,198
  • 5
  • 66
  • 97
  • IMHO you've suggested the *easiest workaround*. Even if it is a workaround, and not an answer, I accept it and that's the solution I will use for my code (converting entities like `↑` to `↑` on server side). – Arseni Mourzenko May 05 '10 at 18:03
  • 1
    I hate to be that guy, but please take a look at my answer. – s4y May 05 '10 at 18:04
  • @Sidnicious No please, your solution is the best IMO! +1 from me! @MainMa feel free to accept his answer instead. I'd go for it :) – jweyrich May 05 '10 at 18:07
  • @Sidnicious: well, it may be an alternative solution. Both are good, I think, so I uprated yours, still letting the answer of jweyrich as accepted, because it seems more straightforward, and does not require to use JavaScript character codes (different from HTML codes). – Arseni Mourzenko May 05 '10 at 18:09
  • JavaScript string literal escapes are in hex, like `→` escapes in HTML. If you really want to use the decimal codes you can: `'A '+String.fromCharCode(8594)+' B'`. – bobince May 05 '10 at 19:18
  • @MainMa: Like bobince said, the only difference is decimal/hexadecimal. It's trivial to convert, you can even use JavaScript: `(8594).toString(16) → "2192"`! – s4y May 06 '10 at 16:00
0

In that particular case, you don't have encode special HTML characters in JavaScript.

The W3C validator should not complain about this (just tested it) and the document should validate. If not post your code and I'll update my answer.

AlexV
  • 22,658
  • 18
  • 85
  • 122