0

A webservice i use in my Delphi 10.3 returns a string to me consisting of these four bytes: F0 9F 99 82 . I expect a slightly smiling emoji. This site shows this byte sequence as the UTF-8 representation of that emoji. So I guess i have a UTF-8 representation in my string, but its an actual unicode string? How do i convert my string into the actual unicode representation, to show it, for example, in a TMemo?

JP_dev
  • 95
  • 9
  • That's not a "_4 byte string_" - it's **one** character expressed in 4 bytes as of UTF-8. Use [`TEncoding.UTF8.GetString( my_bytes )`](https://docwiki.embarcadero.com/Libraries/Sydney/en/System.SysUtils.TEncoding.GetString) then. – AmigoJack Nov 10 '21 at 15:59
  • Thank you, that did work. You should repost this as a possible solution, so i can accept it. – JP_dev Nov 10 '21 at 16:17
  • How do I get the byte array in the first place? I guess TEncoding.XXX.GetBytes(InputString), but I dont know what to insert for XXX. I only have the string variable filled by the webservice available, not an actual byte array. – JP_dev Nov 11 '21 at 06:22
  • So the related Qs on the right side like [How to convert strings to array of byte and back](https://stackoverflow.com/q/21442665/4299358) don't answer that for you? Should your Q include actual code so we see how everything looks in the first place? Are you sure you don't want to create a new separate Q? – AmigoJack Nov 11 '21 at 11:24
  • Indeed it does. I didnt notice it. – JP_dev Nov 12 '21 at 09:12

1 Answers1

2

The character has the Unicode code point U+1F642. Displaying text is defined thru an encoding: how a set of bytes has to be interpreted:

  • in UTF-8 one character can consist of 8, 16, 24 or 32 bits (1 to 4 Bytes); this one is $F0 $9F $99 $82.
  • in UTF-16 one character can consist of 16 or 32 bits (2 or 4 bytes = 1 or 2 Words); this one is $D83D $DE42 (using surrogates).
  • in UTF-32 one character always consists of 32 bits (4 bytes = 1 Cardinal or DWord) and always equals to the code point, that is $1F642.

In Delphi, you can use:

  • TEncoding.UTF8.GetString() for UTF-8
  • (or TEncoding.Unicode.GetString() if you'd have UTF-16LE
  • and TEncoding.BigEndianUnicode.GetString() if you'd have UTF-16BE).

Keep in mind that is just a character like each letter, symbol and whitespace of this text: it can be marked thru selection (i.e. Ctrl+A) and copied to the clipboard (i.e. Ctrl+C). No special care is needed.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
AmigoJack
  • 5,234
  • 1
  • 15
  • 31
  • 1
    If you know that the received text is UTF-8 encoded, another option would be to put the raw bytes into a `UTF8String`, and then just assign that to a normal `String`. The RTL will handle the conversion from UTF-8 to UTF-16 (Delphi's native string encoding) for you. – Remy Lebeau Nov 10 '21 at 17:19