1

ECMAScript treats strings as UTF-16.

If I write a program in my text editor, I assume the most-likely default encoding will be UTF-8.

console.log('')

So how does this "work"? Does it work because UTF-16 is directly compatible with UTF-8, which, in turn, is directly compatible with ASCII?

Ben Aston
  • 53,718
  • 65
  • 205
  • 331

1 Answers1

2

See section 10.1, Source Text:

ECMAScript code is expressed using Unicode. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars. The actual encodings used to store and interchange ECMAScript source text is not relevant to this specification. Regardless of the external source text encoding, a conforming ECMAScript implementation processes the source text as if it was an equivalent sequence of SourceCharacter values, each SourceCharacter being a Unicode code point.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • So JavaScript engines typically have within them a text encoding detection capability, that permits either conversion of source text to UTF-16, or the treatment of differently encoded source text as UTF-16? – Ben Aston Apr 02 '20 at 10:59
  • 1
    @52d6c6af JS engines just have an unicode sequence input, they don't have any encoding detection. That's part of the embedding, e.g. the HTTP code in the browser and [for Node.js I actually have no idea](https://stackoverflow.com/q/10125141/1048572). The unicode sequence is fed to the parser, which [constructs `String` values from literals](https://www.ecma-international.org/ecma-262/10.0/index.html#sec-literals-string-literals) (which includes things like converting escape sequences). – Bergi Apr 02 '20 at 11:13