Why did Microsoft choose UTF-16 LE as the default default encoding in powershell?

Question

From a design standpoint, what made Windows PowerShell creators choose UTF-16LE over UTF8 when UTF8 is common and widely used and has a smaller footprint.

To my novice eye, UTF-16LE does not seem to provide any advantage over UTF 8 either.

I'm looking for some factual evidence regarding the design decisions that were made.

As long as you write a-z, I agree. But for languages between U+0800 and U+FFFF, UTF-16 is actually shorter than UTF-8. (i.e. Thai ฃ U+0E03 takes three bytes in UTF-8: E0 B8 83). But for the real reason, see the answers below :) — Vbakke, Jul 05 '22 at 21:18

score 3 · Accepted Answer · answered Sep 13 '20 at 01:19

3

Windows jumped from code pages (called multi byte) to UTF-16LE (called wide char). It made the code simpler if you assume all chars were 16 bits. They assumed that (this encoding is called UCS-2). Unicode consortium later decided 16 bits was not enough, but it was too late for Microsoft to change course. So, their definition of "wide char" just changed from UCS-2 to UTF-16LE.

answered Sep 13 '20 at 01:19

Dialecticus

16,400
7
43
103

how did it make the code simpler? – Shankara Narayana Sep 13 '20 at 01:30
1

@Swagger68 because then you don’t need to write code to handle the possibility of different codepages - or have to essentially rewrite code twice to support MBCS when you need to keep single-byte encoding support for performance (this was the early 1990s, which was a time when we still had to think about near/far pointers and supporting computers with 4MB of RAM or less). – Dai Sep 13 '20 at 07:14
Also, see the answer in the later duplicate of this question: https://stackoverflow.com/a/66924697/2444168 – Vbakke Jul 05 '22 at 21:21

Why did Microsoft choose UTF-16 LE as the default default encoding in powershell?

1 Answers1