1

I'm trying to do exactly what the title say, insert an emoji into a string in Delphi 2007, just like the example below :

procedure TForm1.Button1Click(Sender: TObject);
var s : string;
begin
s := 'This is my original string (y)';
s := ansireplacestr(s,'(y)','');
showmessage(s);
end;

I can even paste the emoji into IDE's code, but in runtime showmessage results in this :

This is my original string ????

Is there a way to achieve this task in Delphi 2007 ? Due to several reasons i can't upgrade Delphi right now.

Someone said my question is solved on this topic :

Handling a Unicode String in Delphi Versions <= 2007

But this topic just says to use third-party components, without telling exactly how to do it.

EDIT : After suggested, i tried to use the functions pos, delete and insert and a widestring var :

function addEmoji(mystring : widestring) : widestring;
var r, aux : widestring;
p : integer;
begin
r := mystring;
while pos('(y)',r) > 0 do
  begin
    aux := r;
    p := pos('(y)',aux);
    Insert('',aux,p);
    delete(aux,pos('(y)',aux),3);
    r := aux;
  end;
result := r;
end;

But the result is the '(y)' replaced by '????'.

Community
  • 1
  • 1
delphirules
  • 6,443
  • 17
  • 59
  • 108
  • @RemyLebeau Thank you. Is there any function similar to ansireplacestr, that could do the job without loosing the emoji data ? – delphirules Mar 28 '17 at 19:25
  • @RemyLebeau I tried using these functions, but they still replaces the emoji by '???'. I just edited my question with this new approach. – delphirules Mar 28 '17 at 19:44
  • I found curious why people would mark a question with minus , if the reply has +6 points already... – delphirules Mar 29 '17 at 12:06

1 Answers1

9

In Delphi 2007, the default string type is AnsiString. Emojis require Unicode handling, as they use high Unicode codepoints that simply do not fit/exist in most commonly used Ansi encodings. So you need to use a Unicode UTF encoding instead (UTF-7, -8, -16, or -32).

You can use AnsiString for UTF-71, or UTF8String2 for UTF-8, or WideString for UTF-16, or UCS4String3 for UTF-32.

1: UTF-7 is a 7-bit ASCII compatible encoding.

2: UTF8String does exist in Delphi 2007 (it was introduced in Delphi 6), but it is not a true UTF-8 string type, it is just an alias for AnsiString with the expectation that it always holds UTF-8 encoded data. You have to use UTF8Encode() and UTF8Decode() to ensure proper conversions to other encodings via UTF-16. UTF8String did not become a true UTF-8 string type until Delphi 2009 (UTF8Encode() and UTF8Decode() were also deprecated).

3: UCS4String also exists since Delphi 6, but it is not a true string type at all (even in modern Delphi versions). It is just an alias for array of UCS4Char.

The RTL doesn't have any native support for UTF-7 (but it is not hard to implement manually), and very little support for UTF-32 (only to facilitate conversions between UTF-16 <-> UTF-32), so you should stick with UTF-8 or UTF-16 in your code.

You are going to lose Emoji data if you convert UTF data to Ansi, such as if you pass a WideString to ShowMessage(). You can pass a WideString to the Win32 API MessageBoxW() function instead, and you won't have any data loss, however the Emoji may or may not appear correctly depending on the font used by the dialog (but it won't appear as ??, at least).

However, the native RTL in Delphi 2007 simply does not support what you are attempting, at least not for UTF-16. You would have to find a 3rd party WideString-based function, or just write your own using the RTL's Pos(), Delete(), and Insert() intrinsic functions, which are overloaded for WideString data, eg:

function WideReplaceStr(const S, FromText, ToText: WideString): WideString;
var
  I: Integer;
begin
  Result := S;
  repeat
    I := Pos(FromText, Result);
    if I = 0 then Break;
    Delete(Result, I, Length(FromText));
    Insert(ToText, Result, I);
  until False;
end;

var
  s : WideString;
begin
  s := 'This is my original string (y)';
  s := WideReplaceStr(s, '(y)', '');
  MessageBoxW(0, PWideChar(s), '', MB_OK);
end; 

However, using UTF-8, you can accomplish the same thing using the native RTL, but you still can't use ShowMessage() (well, you could, but it won't show non-ASCII characters correctly):

var
  s : UTF8String;
begin
  s := UTF8Encode('This is my original string (y)');
  s := AnsiReplaceStr(s, '(y)', UTF8Encode(''));
  MessageBoxW(0, PWideChar(UTF8Decode(s)), '', MB_OK);
end;

Either way, make sure your code editor is set to save the .pas file in UTF-8, otherwise you can't use the literal '', you would have to use something more like this instead:

var
  Emoji: WideString;

SetLength(Emoji, 2);
Emoji[1] := WideChar($D83D);
Emoji[2] := WideChar($DC4D);

Then you can do this:

var s: WideString;
...
s := WideReplaceStr(s, '(y)', Emoji);

Or:

var s: UTF8String;
...
s := AnsiReplaceStr(s, '(y)', UTF8Encode(Emoji));
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thank you, i tried your code after some adjustment (replaced 'To:Widestring' by 'New:Widestring', can't use 'To' as var name, and added function return value), but it still replace the emoji by '????' – delphirules Mar 28 '17 at 19:50
  • @delphirules: works fine for me when I try it. For good measure, I even went all the way back to Delphi 7 to test it. – Remy Lebeau Mar 28 '17 at 20:16
  • @RemyLebeau: Won't `WideReplaceStr` get stuck in a loop if `ToText` happens to contain `FromText`? – MartynA Mar 28 '17 at 20:35
  • @MartynA: Probably. `PosEx()` would be better than `Pos()`, except that it does not support `WideString`. `AnsiReplaceStr()` is just a wrapper for `StringReplace()`, which also does not support `WideString`. This was just a quick demo, obviously a more production-ready solution would be needed. – Remy Lebeau Mar 28 '17 at 21:20
  • I don't know why but none of this solutions worked for me. I end up with '(y)' replaced by '???' or in the case of UTF8Encode, returning a lot of '1/2' symbols – delphirules Mar 29 '17 at 12:40
  • @RemyLebeau By the way, i'm curious about the 'WideChar($D83D);' ; where do i find these codes for the emojis ? – delphirules Mar 29 '17 at 12:42
  • 1
    @delphirules all I can tell you is that it works fine for me. Makes me think you haven't set it up correctly. As for the WideChar values, `` is Unicode codepoint `U+1F44D THUMBS UP SIGN`, which is encoded in UTF-16 surrogates as `$D83D $DC4D`. See [Emoji Unicode Tables](https://apps.timwhitlock.info/emoji/tables/unicode). – Remy Lebeau Mar 29 '17 at 14:22
  • @RemyLebeau Ok, i will check my code. Thank you very much for your help ! – delphirules Mar 30 '17 at 19:23