12

I am working on a chatting application in WPF and I want to use emoticons in it. I am working on WPF app. I want to read emoticons which are coming from Android/iOS devices and show respective images.

On WPF, I am getting a black Emoticon looking like this. I somehow got a library of emoji icons which are saved with respective hex/escaped unicode values. So, I want to convert these symbols of emoticons into UTF-32/escaped unicode so that I can directly replace related emoji icons with them.

I had tried to convert an emoticon to its unicode but end up getting a different string with couple of symbols, which are having different unicode.

string unicodeString = "\u1F642";  // represents  

Encoding unicode = Encoding.Unicode;
byte[] unicodeBytes = unicode.GetBytes(unicodeString);

char[] unicodeChars = new char[unicode.GetCharCount(unicodeBytes, 0, unicodeBytes.Length)];
unicode.GetChars(unicodeBytes, 0, unicodeBytes.Length, unicodeChars, 0);
string asciiString = new string(unicodeChars);

Any help is appreciated!!

Rand Random
  • 7,300
  • 10
  • 40
  • 88
Joker_37
  • 839
  • 2
  • 8
  • 20
  • What do you mean that "emoticons which are coming from Android/iOS devices"? I definetly thought you would get them already in unicode and not as an image or what ever you are talking about?!? – Rand Random Jun 23 '17 at 19:34
  • Char.ConvertFromUtf32(0x1F642) would give you the UTF-16 representation/proper C# Unicode string – ckuri Jun 23 '17 at 19:36
  • @RandRandom Actually the app is cross platform so it receives emoticons from Android and iOS devices, which I want to detect in WPF client. – Joker_37 Jun 24 '17 at 09:50
  • Yeah got that part, but I believe you didnt get me. Your question is "How to convert emoticons to its UTF-32/escaped unicode" and I am telling you that I dont believe you are receiving the chat message from Android/iOS any other way than in a UTF-32/escaped unicode way. My guess is you are already receiving something like "Hello dear. I hope wont fail the test. \u1F642" so my question was if that is the case why do you need/want to transform that into a bytearray and than again into a string, when you have what you are asking for in the first place.... – Rand Random Jun 24 '17 at 21:28
  • 2
    @RandRandom No, I have shard a image of smiles in my question. I am receiving those emojis in that format. In WPF, it is being converted into symbol which represent smiles, Please refer my question where I have shard an image. So the issue is that I am getting a symbol which is present in 'Segoe UI Emoji' font family in WPF and not getting UTF-32 code. So I want to know that is there any way I can convert those symbols to UTF-32 /escape unicode ? – Joker_37 Jun 25 '17 at 09:39

4 Answers4

20

Your escaped Unicode String is invalid in C#.

string unicodeString = "\u1F642";  // represents  

This piece of code doesnt represent the "slightly smiling face" since C# only respects the first 4 characters - representing an UTF-16 (with 2 Bytes).

So what you actually get is the letter representing 1F64 followed by a simple 2. http://www.fileformat.info/info/unicode/char/1f64/index.htm

So this: ὤ2

If you want to type hex with 4 Bytes and get the corresponding string you have to use:

var unicodeString = char.ConvertFromUtf32(0x1F642);

https://msdn.microsoft.com/en-us/library/system.char.convertfromutf32(v=vs.110).aspx

or you could write it like this:

\uD83D\uDE42

This string can than be parsed like this, to get your desired result which is again is the hex value that we started with:

var x = char.ConvertFromUtf32(0x1F642);

var enc = new UTF32Encoding(true, false);
var bytes = enc.GetBytes(x);
var hex = new StringBuilder();
for (int i = 0; i < bytes.Length; i++)
{
    hex.AppendFormat("{0:x2}", bytes[i]);
}
var o = hex.ToString();
//result is 0001F642

(The result has the leading Zeros, since an UTF-32 is always 4 Bytes)

Instead of the for Loop you can also use BitConverter.ToString(byte[]) https://msdn.microsoft.com/en-us/library/3a733s97(v=vs.110).aspx the result than will look like:

var x = char.ConvertFromUtf32(0x1F642);

var enc = new UTF32Encoding(true, false);
var bytes = enc.GetBytes(x);
var o = BitConverter.ToString(bytes);
//result is 00-01-F6-42
Rand Random
  • 7,300
  • 10
  • 40
  • 88
  • 7
    This is a great answer. One point of confusion for me was where \uD83D\uDE42 came from. To clarify for others, these are known as "surrogate pairs." You can find more information here: https://unicodebook.readthedocs.io/unicode_encodings.html. It's essentially the result of converting UTF-32 to two UTF-16 values. A simple conversion tool can be found here: http://trigeminal.fmsinc.com/16to32AndBack.asp – Jason Rae Aug 08 '19 at 13:56
1

Please be aware that Encoding.Unicode is UTF-16 in C#. To read 32 bits Unicode, there is this Encoding.UTF32. Link on MSDN for Encoding.​UT​F32

Jimbot
  • 696
  • 5
  • 20
1

Since C# source files can contain UTF-32 string literals, there is no need to use any encodings for this task.


Example 1.

var rgch = "\U0001F642".ToCharArray();
var str = $"\\u{(ushort)rgch[0]:X4}\\u{(ushort)rgch[1]:X4}";

Result: "\uD83D\uDE42"         Length of string str is 12 UTF-16 code points (24 bytes)



Example 2.

var rgch = "\U0001F642".ToCharArray();
var str = rgch[0] + "" + rgch[1];

Result: ""             Length of string str is 2 UTF-16 code points (4 bytes)


Glenn Slayden
  • 17,543
  • 3
  • 114
  • 108
0

You can simply use @using System.Web for encoding:

var columndata = "CSR story with emoji "`
columndata   = HttpUtility.UrlEncode(columndata);

It will encode the text and emoji.

Here I have text with HTML tags so while decoding I have used Trim() for decoding:

string titleRaw = HttpUtility.UrlDecode(@Model.columnNamne.ToString().Trim());

If not storing in HTML tags then:

string titleRaw = HttpUtility.UrlDecode(@Model.columnNamne.ToString());
Whit Waldo
  • 4,806
  • 4
  • 48
  • 70