When converting emoji encoded in UTF-8 to string we did not get the correct characters using UTF8ToString. We receive these UTF8 characters from an external interface. We tested the UTF characters with an online UTF8 decoder and saw that they contain the correct characters. I suspect these are composite characters.
procedure TestUTF8Convertion;
const
utf8Denormalized: RawByteString = #$ED#$A0#$BD#$ED#$B8#$85#$20 + #$ED#$A0#$BD#$ED#$B8#$86#$20 + #$ED#$A0#$BD#$ED#$B8#$8A;
utf8Normalized: RawByteString = #$F0#$9F#$98#$85 + #$F0#$9F#$98#$86 + #$F0#$9F#$98#$8A;
begin
Memo1.Lines.Add(UTF8ToString(utf8Denormalized));
Memo1.Lines.Add(UTF8ToString(utf8Normalized));
end;
Output in Memo1:
Denormalized: ���� ���� ����
Normalized:
Writing the own conversion function based on the WinApi function MultiByteToWideChar
did not solve this issue.
function UTF8DenormalizedToString(s: PAnsiChar): string;
var
pwc: PWideChar;
len: cardinal;
begin
GetMem(pwc, (Length(s) + 1) * SizeOf(WideChar));
len := MultiByteToWideChar(CP_UTF8, MB_PRECOMPOSED, @s[0], -1, pwc, length(s));
SetString(result, pwc, len);
FreeMem(pwc);
end;