A webservice i use in my Delphi 10.3 returns a string to me consisting of these four bytes: F0 9F 99 82 . I expect a slightly smiling emoji. This site shows this byte sequence as the UTF-8 representation of that emoji. So I guess i have a UTF-8 representation in my string, but its an actual unicode string? How do i convert my string into the actual unicode representation, to show it, for example, in a TMemo?
Asked
Active
Viewed 1,140 times
0
-
That's not a "_4 byte string_" - it's **one** character expressed in 4 bytes as of UTF-8. Use [`TEncoding.UTF8.GetString( my_bytes )`](https://docwiki.embarcadero.com/Libraries/Sydney/en/System.SysUtils.TEncoding.GetString) then. – AmigoJack Nov 10 '21 at 15:59
-
Thank you, that did work. You should repost this as a possible solution, so i can accept it. – JP_dev Nov 10 '21 at 16:17
-
How do I get the byte array in the first place? I guess TEncoding.XXX.GetBytes(InputString), but I dont know what to insert for XXX. I only have the string variable filled by the webservice available, not an actual byte array. – JP_dev Nov 11 '21 at 06:22
-
So the related Qs on the right side like [How to convert strings to array of byte and back](https://stackoverflow.com/q/21442665/4299358) don't answer that for you? Should your Q include actual code so we see how everything looks in the first place? Are you sure you don't want to create a new separate Q? – AmigoJack Nov 11 '21 at 11:24
-
Indeed it does. I didnt notice it. – JP_dev Nov 12 '21 at 09:12
1 Answers
2
The character has the Unicode code point U+1F642. Displaying text is defined thru an encoding: how a set of bytes has to be interpreted:
- in UTF-8 one character can consist of 8, 16, 24 or 32 bits (1 to 4
Byte
s); this one is$F0 $9F $99 $82
. - in UTF-16 one character can consist of 16 or 32 bits (2 or 4 bytes = 1 or 2
Word
s); this one is$D83D $DE42
(using surrogates). - in UTF-32 one character always consists of 32 bits (4 bytes = 1
Cardinal
orDWord
) and always equals to the code point, that is$1F642
.
In Delphi, you can use:
TEncoding.UTF8.GetString()
for UTF-8- (or
TEncoding.Unicode.GetString()
if you'd have UTF-16LE - and
TEncoding.BigEndianUnicode.GetString()
if you'd have UTF-16BE).
Keep in mind that is just a character like each letter, symbol and whitespace of this text: it can be marked thru selection (i.e. Ctrl+A) and copied to the clipboard (i.e. Ctrl+C). No special care is needed.

Remy Lebeau
- 555,201
- 31
- 458
- 770

AmigoJack
- 5,234
- 1
- 15
- 31
-
1If you know that the received text is UTF-8 encoded, another option would be to put the raw bytes into a `UTF8String`, and then just assign that to a normal `String`. The RTL will handle the conversion from UTF-8 to UTF-16 (Delphi's native string encoding) for you. – Remy Lebeau Nov 10 '21 at 17:19