1

I have developed a small encoder class that encodes and decodes GB1830. I need this to read and write Dicom files, which could use GB1830. For those of you that are not into these things, GB1830 is a chinese multibyte encoding (1, 2 or 4 bytes) that covers the same codepoints as Unicode (over 1.1M), but with a very complex mapping, to keep it compatible with older chinese standards (GBK).

But on Windows this turned out to be easier than I expected:

Type
  GB18030String = Type AnsiString(54936);

Class Procedure TMEGB18030Encoder.Decode(Const Input: TMEStreamHandler; Const Output: TMEMultiCharSetString);
Var
  sGB18030: GB18030String;
  sUTF16LE: UnicodeString;
Begin
  sGB18030 := AnsiStringOf(Input.ReadRawByteString, 54936);
  sUTF16LE := UnicodeString(sGB18030);
  TMEUTF16LEEncoder.Decode(RawByteStringOf(sUTF16LE), Output);
End;

Class Procedure TMEGB18030Encoder.Encode(Const Input: TMEMultiCharSetString; Const Output: TMEStreamHandler);
Var
  sUTF16LE: UnicodeString;
  sGB18030: GB18030String;
Begin
  sUTF16LE := StringOf(TMEUTF16LEEncoder.Encode(Input));
  sGB18030 := GB18030String(sUTF16LE);
  Output.WriteRawByteString(RawByteStringOf(AnsiString(sGB18030)));
End;

This basically uses an AnsiString with a codepage of 54936 and lets the OS handle the conversion from and to GB1830 and Unicode (UTF16LE).

But obviously this code won't event compile on Android, as there are no AnsiStrings, and I suppose there is not much support for Windows codepages. But I could be wrong on this, as I am actually going through a call to LocaleCharsFromUnicode, which I think is emulated on non Windows platforms.

All this is subject to unit testing... but if I can't even find a way to recompile this code on Android, I see no way to start testing it. Any hints or ideas?

Frazz
  • 2,995
  • 2
  • 19
  • 33
  • Is this any help to get you started? https://stackoverflow.com/questions/26892449/converting-unicodestring-to-ansistring – Dsm Jul 21 '17 at 10:43
  • Take a look at [TEncoding class](http://docwiki.embarcadero.com/Libraries/Tokyo/en/System.SysUtils.TEncoding) it should work on all platforms. – Dalija Prasnikar Jul 21 '17 at 11:13
  • @Dsm that is exactly where my AnsiStringOf function comes from. It actually manipulates a RawByteString setting its codepage. But I don't see a way of getting this to work on Android. Keep in mind that the *magic* here comes from the hard typecast sUTF16LE := UnicodeString(sGB18030) and sGB18030 := GB18030String(sUTF16LE). It all works because Delphi knows that that string of bytes is encoded using a particular codepage, and converts the data accordingly. – Frazz Jul 21 '17 at 11:40
  • @DalijaPrasnikar - TEncoding works on all platforms, but it provides no support for GB1830, not even in the TCharSetEncoding implementation (which does cover extended ASCII and other commond encodings). Reimplementing GB1830-Unicode conversion is not a task I have time to struggle with. – Frazz Jul 21 '17 at 11:50
  • Android has [Charset](https://developer.android.com/reference/java/nio/charset/Charset.html) class that implements more encodings, however `availableEncodings` besides standard ones are device dependent. Still, that may be way to go. – Dalija Prasnikar Jul 21 '17 at 12:41
  • have you found any solution ? – Nick Chan Abdullah Mar 12 '20 at 10:28
  • No... I never found any :( – Frazz Mar 30 '20 at 09:32

0 Answers0