1

I just posted a question about Unicode character constants, where $HIGHCHARUNICODE appeared to be the reason. Now with the default $HIGHCHARUNICODE OFF (Delphi XE2), why is this:

const
  AllLowByteValues =#$00#$01#$02#$03#$04#$05#$06#$07#$08#$09#$0a#$0b#$0c#$0d#$0e#$0f;
  AllHighByteValues=#$D0#$D1#$D2#$D3#$D4#$D5#$D6#$D7#$D8#$D9#$Da#$Db#$Dc#$Dd#$De#$Df;

==> Sizeof(AllLowByteValues[1])  = 2
==> Sizeof(AllHighByteValues[1]) = 2

If "All hexadecimal #$xx 2-digit literals are parsed as AnsiChar" for #$80 ... #$FF, then why is AllHighByteValues a unicode String and not an ANSIString?

Community
  • 1
  • 1
Jan Doggen
  • 8,799
  • 13
  • 70
  • 144
  • 1
    I think these literals are parsed as Ansi and converted to Unicode (because default string type is Unicode) – kludg Sep 06 '12 at 11:44
  • Your experiment doesn't actually show how the values are stored. The SizeOf expression is resolved at compile time and doesn't actually inspect the stored value. You need to show that the strings you've defined are actually stored as WideChar values *in the EXE file*. – Rob Kennedy Sep 06 '12 at 13:39

2 Answers2

1

That's because string constants are PChar and so made up of UTF-16 elements.

From the documentation:

String constants are assignment-compatible with the PChar and PWideChar types, which represent pointers to null-terminated arrays of Char and WideChar values.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
1

You are not taking that into account that String and Character literals are context-sensitive in D2009+. If a literal is used in an Ansi context, it will be stored as Ansi. If a literal is used in a Unicode context, it will be stored as Unicode. HIGHCHARUNICODE only applies to 3-digit numeric Character literals between #128-#255 and 2-digit hex Character literals between #$80-#$FF. Those particular values are ambiquious between Ansi and Unicode, so HIGHCHARUNICODE is used to address the ambiquity. HIGHCHARUNICODE does not apply to other types of literals, including String literals. If you pass a String or Character literal to SizeOf(), there is no Ansi/Unicode context in the source code for the compiler to use, so it is going to use a Unicode context except in the specific case where HIGHCHARUNICODE applies, in which case an Ansi context is used if HICHCHARUNICODE is OFF. That is what you are seeing happen.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770