How can I convert a 4-byte string into an unicode emoji?

Question

A webservice i use in my Delphi 10.3 returns a string to me consisting of these four bytes: F0 9F 99 82 . I expect a slightly smiling emoji. This site shows this byte sequence as the UTF-8 representation of that emoji. So I guess i have a UTF-8 representation in my string, but its an actual unicode string? How do i convert my string into the actual unicode representation, to show it, for example, in a TMemo?

That's not a "_4 byte string_" - it's **one** character expressed in 4 bytes as of UTF-8. Use [`TEncoding.UTF8.GetString( my_bytes )`](https://docwiki.embarcadero.com/Libraries/Sydney/en/System.SysUtils.TEncoding.GetString) then. — AmigoJack, Nov 10 '21 at 15:59
Thank you, that did work. You should repost this as a possible solution, so i can accept it. — JP_dev, Nov 10 '21 at 16:17
How do I get the byte array in the first place? I guess TEncoding.XXX.GetBytes(InputString), but I dont know what to insert for XXX. I only have the string variable filled by the webservice available, not an actual byte array. — JP_dev, Nov 11 '21 at 06:22
So the related Qs on the right side like [How to convert strings to array of byte and back](https://stackoverflow.com/q/21442665/4299358) don't answer that for you? Should your Q include actual code so we see how everything looks in the first place? Are you sure you don't want to create a new separate Q? — AmigoJack, Nov 11 '21 at 11:24

score 2 · Accepted Answer · edited Nov 10 '21 at 17:17

The character has the Unicode code point U+1F642. Displaying text is defined thru an encoding: how a set of bytes has to be interpreted:

in UTF-8 one character can consist of 8, 16, 24 or 32 bits (1 to 4 Bytes); this one is $F0 $9F $99 $82.
in UTF-16 one character can consist of 16 or 32 bits (2 or 4 bytes = 1 or 2 Words); this one is $D83D $DE42 (using surrogates).
in UTF-32 one character always consists of 32 bits (4 bytes = 1 Cardinal or DWord) and always equals to the code point, that is $1F642.

In Delphi, you can use:

TEncoding.UTF8.GetString() for UTF-8
(or TEncoding.Unicode.GetString() if you'd have UTF-16LE
and TEncoding.BigEndianUnicode.GetString() if you'd have UTF-16BE).

Keep in mind that is just a character like each letter, symbol and whitespace of this text: it can be marked thru selection (i.e. Ctrl+A) and copied to the clipboard (i.e. Ctrl+C). No special care is needed.

If you know that the received text is UTF-8 encoded, another option would be to put the raw bytes into a `UTF8String`, and then just assign that to a normal `String`. The RTL will handle the conversion from UTF-8 to UTF-16 (Delphi's native string encoding) for you. — Remy Lebeau, Nov 10 '21 at 17:19

How can I convert a 4-byte string into an unicode emoji?

1 Answers1