1

This is for handling facebook webhooks.

An event string arrives like this

{"object":"page","entry":[{"id":"222222222","time":1536713510549,"messaging":[{"sender":{"id":"1111111111"},"recipient":{"id":"355433484576638"},"timestamp":1536713509901,"message":{"mid":"VnOoUhb2FUTyfnkXtmKgqDCfJlgJPB_n1gj-8aC6ka4-Oo2GjMXS82vHH9ChydJrPX_5Zu3sJ6skCv8JToF1IA","seq":206765,"text":"Jeg m\u00e5 bare si at jeg elsker Obosbladet! St\u00e5 p\u00e5 videre! \ud83d\ude00\ud83d\ude2c\ud83d\ude01\ud83d\ude02\ud83d\ude03\ud83d\ude04"}}]}]}

This is deserialized using

Dim TestObj As RealTimeEvent  = JsonConvert.DeserializeObject(Of RealTimeEvent)(eventStr)

At this point , if I view the TestObj message in the debugger , I see

"Jeg må bare si at jeg elsker Obosbladet! Stå på videre! "

Note the Swedish characters have been handled correctly, but the java escaped emoticon is not.(\ud83d\ude00\ud83d\ude2c\ud83d\ude01\ud83d\ude02\ud83d\ude03\ud83d\ude04)

If I then try to deserialize the object

JsonConvert.SerializeObject(TestObj )

I get

{""RawEvent"":"""",""object"":""page"",""entry"":[{""id"":""355433484576638"",""time"":""1536713510549"",""changes"":null,""messaging"":[{""optin"":null,""read"":null,""postback"":null,""sender"":{""id"":""975511412531391""},""recipient"":{""id"":""355433484576638""},""timestamp"":""1536713509901"",""message"":{""mid"":""VnOoUhb2FUTyfnkXtmKgqDCfJlgJPB_n1gj-8aC6ka4-Oo2GjMXS82vHH9ChydJrPX_5Zu3sJ6skCv8JToF1IA"",""seq"":""206765"",""text"":""Jeg må bare si at jeg elsker Obosbladet! Stå på videre! "",""attachments"":null}}]}]}

The Swedish characters are converted.. which is what I want, but I have no chance of handling the emoticon

Is there anyway I preserve everything that is not understood by the Newtonsoft De serializing process but keep the conversion of Swedish and other characters?

---Edit-- Adding explanation of what I am trying to achieve--- I need to be able to access the original definition of the emoticon.."\ud83d\ude00\ud83d\ude2c\ud83d\ude01\ud83d\ude02\ud83d\ude03\ud83d\ude04" I am integrating to another system that can not handle emoticons at all. I have written a 'translator' which will parse the message text looking for the java escaped data. I take the whole emoticon definition (all pairs) and reduce until I find a matching definition. Perhaps there is a way to tell the serializer to not convert any escaped values and keep the message text 'raw'? ( I have tried various JsonSerializerSettings but not found any)

Rob Den Boer
  • 280
  • 3
  • 13
  • What IDE your'e using? I tired watching it on the watch window of VS2012, and it displays correctly. – OfirD Sep 12 '18 at 08:37
  • What exactly is the problem? has a Unicode value of U+1F600: https://www.fileformat.info/info/unicode/char/1f600/index.htm. Thus it's not the the basic multilingual plane and so must be represented by a high and low [surrogate pair](https://learn.microsoft.com/en-us/dotnet/api/system.char.issurrogate?redirectedfrom=MSDN&view=netframework-4.7.2#System_Char_IsSurrogate_System_Char_). The sending system escaped them as `\ud83d\ude00` but as per http://www.json.org/ this escaping is optional and the actual Unicode characters can be included directly in the string. – dbc Sep 12 '18 at 10:40
  • Thanks Guys.. sorry this is a little new to me ( the Unicode stuff) I have edited the question to explain what I am trying to achieve. HeyJude.. VS2015 for me :) Many thanks for your time – Rob Den Boer Sep 12 '18 at 21:09
  • 1
    The byte sequences `\ud83d\ude00` aren't *java escaped data*. While the look the same, they are actually JSON character literals as explained in the [standard](https://tools.ietf.org/html/rfc7159): `char = unescaped / escape ( ... %x75 4HEXDIG ) ; uXXXX U+XXXX`. Json.NET converts character literals to the actual Unicode characters at a very low level, there's no way to access the underlying byte stream when reading a string. But it does correctly convert `\ud83d\ude00` to so can't you just check the resulting string for Emojis. – dbc Sep 12 '18 at 23:57
  • 1
    To check for emojis in a string, see e.g. [Find out if Character in String is emoji?](https://stackoverflow.com/q/30757193) which would need porting to .Net. https://en.wikipedia.org/wiki/Emoji#Unicode_blocks should tell you the unicode blocks in which you might find Emojis. – dbc Sep 12 '18 at 23:59
  • 1
    But since a valid, well-formed JSON string could have contained the **actual** Emoji characters rather than escaped Emoji characters, you really need to check for Emojis when **exporting** to the system with which you are integrating, rather than checking during importing for escaping that isn't actually required anyway. – dbc Sep 13 '18 at 00:02
  • Thanks Guys I don't think I have to worry about actual emoticons.. as this is only from facebook messenger. I believe that long string is a single emoji.. all the additional pairs define the type / color etc? Anyway.. I called them java escaped only because I found them described as this on sites such as https://www.charbase.com/1f64c-unicode-person-raising-both-hands-in-celebration – Rob Den Boer Sep 13 '18 at 01:26
  • I may have a very bad hack that appears to let me continue. If I do a replace on the entire json event from Facebook. EventStr = EventStr.Replace("\ud","\\ud") Then I get this string as a result for the message text. ""text"":""Jeg må bare si at jeg elsker Obosbladet! Stå på videre! \\ud83d\\ude00\\ud83d\\ude2c\\ud83d\\ude01\\ud83d\\ude02\\ud83d\\ude03\\ud83d\\ude04" Then I can handle the decoding of the emoticons , parsing for \\ud instead of \ud. I'll try implementing and testing this on traffic. – Rob Den Boer Sep 13 '18 at 01:26
  • Ahh also need to cover \u2 as well. I better look into this more., Thanks @dbc , some great links there – Rob Den Boer Sep 13 '18 at 01:47

0 Answers0