0

Please excuse me, I really need to know how does the incorporated Unicode version (5) works in ECMAScript 4. I just need to know how it's encoded or decoded, or which encoding does ECMAScript 4 use. I'm saying about the encoding used for char codes (or code points, I think) of strings.

Advice: "ultrapasses" here means "bigger than", or further, for example. I thought it was valid in English.

I basically thought it was UTF-16, but for my tests it ultrapasses U+10FFFF. The maximum character code I got using ECMAScript 4, without exceptions, was U+FFFFFF, except that when I'm using String.fromCharCode() to encode this character code, it results in U+1FFFFF (\u{...} generates up to 0xFFFFFF different characters, but String.fromCharCode() generates up to 0x1FFFFF different characters). In ECMAScript 6 code points, the max I can get is U+10FFFF, a small difference, and since it uses UCS-2 (at least in my browser, Chrome), ECMAScript 6 generates more code units (a code unit = 2 bytes), and I guess ECMAScript 6 has a small fail when encoding code points using UCS-2 (though that's not bug, just a small fail), just check my question if you want to know.

0xFFFFFF is the max char code (or code point...?). Why do I think it's a char code in ECMAScript 4? Maybe because there's no String#codePointAt and String#fromCodePoint like in ECMAScript 6, and it really gets out of UCS-2. First let me show you some tests using ECMAScript 4:

(Yes, ECMAScript 4 never existed, but draft, including an unfinished virtual machine for evaluating ECMAScript 4. http://ecmascript.org is down, but still on http://archive.org, so I've made a little copy in a 7Zip file)

 // Decimal: 16777215
 const ch = 0xffffff;
 const chString = '\u{ffffff}';

 // Ultrapasses the maximum char code (or code point), then
 // an exception got thrown, well.
 '\u{1000000}';

 // Ultrapasses it too, but returns '\u{ charCode % 1000000 }' anyways.
 String.fromCharCode(ch + 1);

 // Correct.
 chString.charCodeAt(0); // Code: 16777215

 // I didn't expect this!!! \/
 String.fromCharCode(ch); // Gives me '\u{1fffff}' back.

 // An Unicode char code (which is code point, I think) is always
 // equivalent to one character in the string.
 chString.length; // 1
 String.fromCharCode(ch).length; // 1

The ECMAScript 4 overview doesn't talk further about that, it only mentions it does incorporate Unicode 5, but not the encoding. Which encoding is incorporated in this case? It'd also be nice to know why String.fromCharCode(charCode) is different from \u{...} Unicode code escape by the above examples.

Community
  • 1
  • 1
  • 1
    I quite like the word "ultrapasses", but it doesn't actually exist in English. I'm guessing it means something like "carries on past"? – IMSoP Feb 16 '17 at 21:28
  • ECMAScript 4? [I thought that one never happened.](http://stackoverflow.com/questions/2329602/why-was-ecmascript-4th-edition-completely-scrapped) – user2357112 Feb 16 '17 at 21:29
  • @IMSoP It exists in my language (Portuguese), sorry for my bad terms. It means that the target is bigger than, so the target ultrapasses. –  Feb 16 '17 at 21:33
  • @user2357112 It never happened, but it has a VM (incomplete, I guess), and can be obtained from archive.org. I have yet an ECMAScript 4 package on MediaFire. –  Feb 16 '17 at 21:33
  • 2
    So, the obvious question is ... why do you need to know how a partial implementation of an abandoned language standard handles string encoding? – IMSoP Feb 16 '17 at 21:44
  • @IMSoP Because I'll create a programming language based on ECMAScript 4. –  Feb 16 '17 at 21:45
  • Why not simply design your own way of handling character encodings, or borrow from a language which has a finalised, working, implementation? By the looks of it, the prototype implementation you're testing here is simply broken - there is no such Unicode code point as U+FFFFFF. – IMSoP Feb 16 '17 at 22:48
  • 2
    Oh, and it's definitely nothing to do with Unicode 5; that's just about new characters being added, which a programming language needs to know to answer questions like "is this is a digit?" UTF-16 was added with Unicode 2.0, and the way it works hasn't changed since. – IMSoP Feb 16 '17 at 23:11
  • @IMSoP But what about the maximum U+1FFFFF length when using *`String.fromCharCode()`*? Please just tell me if it's okay that Unicode has `0x1FFFFF` different characters. And I got it, I'll implement such native encoding in my programming language, most probably UTF-16, UTF-8, or who knows... UTF-32, UTF-64? ... –  Feb 16 '17 at 23:19
  • 1
    Read up on UTF-16 [on Wikipedia](https://en.wikipedia.org/wiki/UTF-16), or just search online. The information is fairly easy to find. But in a nutshell, yes, the highest Unicode code point that will ever be assigned is 0x1FFFFF, would be represented in UTF-16 as two 16-bit units, 0xDBFF and 0xDFFF. – IMSoP Feb 16 '17 at 23:38

0 Answers0