In many places we can read that, for example, "C# uses UTF-16 for its strings" (link). Technically, what does this mean? My source file is just some text. Say I'm using Notepad++ to code a simple C# app; how the text is represented in bytes on disk, after I save the file, depends on N++, so that's probably not what people mean. Does that mean that:
- The language specification requires/recommends that the compiler input be encoded as UTF-16?
- The standard library functions are encoding-aware and treat the strings as UTF-16, for example
String
's operator[]
(which returns the n-th character and not the n-th byte)? - Once the compiler produces an executable, the strings stored inside it are in UTF-16?
I've used C# as an example, but this question applies to any language of which one could say that it uses encoding Y for its strings.