4

I thought I was beginning to understand Unicode, but this beats me:

const
c1 = #1;   --> SizeOf() = 2
c2 = #33;  --> SizeOf() = 2
c3 = #127; --> SizeOf() = 2
c4 = #128; --> SizeOf() = 1
c5 = #160; --> SizeOf() = 1 
c6 = #161; --> SizeOf() = 1 
c7 = #255; --> SizeOf() = 1 

Can anyone explain? Delphi XE2, Default Windows-1252 codepage

Thanks Jan

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
Jan Doggen
  • 8,799
  • 13
  • 70
  • 144
  • note also http://qc.embarcadero.com/wc/qcmain.aspx?d=100685 gotcha – Arioch 'The Sep 06 '12 at 11:31
  • @Arioch'The - I wonder how is it possible to fix this 'bug' because that really is how Unicode is implemented in Delphi; Ord(Ch) generally depends on default ANSI codepage; you can get different binaries when compile on systems with different ANSI codepages. – kludg Sep 06 '12 at 11:52
  • @Serg default which ? in same project we can have sources encoded in at least three different codepages... ---- Personally i think Ord(WideChat) should map to UTF-16 Word if possible. Ord(AnsiChar) should map to GetACP() Byte, if possible. Respectively there to be Chr(Byte):AnsiChar and Chr(Word):WideChar; What to do with MBCS and Unicode Surrogates i don't know, but hope it is rare beast. – Arioch 'The Sep 06 '12 at 12:07
  • No problems with 'pure' Unicode and {$HIGHCHARUNICODE ON} setting; the problem is when you use Ansi Chars, both with {$HIGHCHARUNICODE} ON and OFF, because as Marian said `Ansi` in Delphi means locale specific and pretty awkward and useless when you need to support multiple Ansi codepages. – kludg Sep 06 '12 at 12:26

1 Answers1

8

That is documented - see $HIGHCHARUNICODE directive

kludg
  • 27,213
  • 5
  • 67
  • 118
  • @DavidHeffernan I think it's for the benefit of pre-Unicode code, to make #160 the same character it was in D2007 and earlier. –  Sep 06 '12 at 11:02
  • @DavidHeffernan After re-reading, I think you're right, thanks for the clarification. –  Sep 06 '12 at 11:05
  • 1
    @DavidHeffernan - the feature is bizarre but consistent. Probably there were compatibility reasons for it, same that caused `AnsiUpperCase` to became a unicode function in Delphi 2009. – kludg Sep 06 '12 at 11:07
  • Ansi* functions have been misnomed from their inception on. They should have been named LocaleSpecific* instead of Ansi* The character set is just a part of locale specificity. – Marjan Venema Sep 06 '12 at 11:12
  • 1
    @Serg Marco Cantu's book confirms what you say about the reasoning behind this. – David Heffernan Sep 06 '12 at 11:12
  • 2
    EMBT should have made $HIGHCHARUNICODE=ON the default. I knew of its existence but did not look, assuming consistent behaviour as the standard, and compiler directives for exceptions (e.g for backward compatibility) – Jan Doggen Sep 06 '12 at 11:24
  • 1
    @Jan Even if not for 2009, it should be the default by now! – David Heffernan Sep 06 '12 at 11:29
  • @MarjanVenema similar for the System32 directory that contains the x64 DLLs and SysWoW64 the x86 ones: http://stackoverflow.com/questions/949959/why-do-64bit-dlls-go-to-system32-and-32bit-dlls-to-syswow64-on-64bit-windows – Jeroen Wiert Pluimers Sep 06 '12 at 13:52