-1

I am receiving emoji in the below format from a source system.

  • '\u2764' for ❤
  • '\ud83d\udc4d' for

I need to know how i can convert this, so it displays as Proper emoji in Unity Debug log and in Text fields. Please help me.

When I try the below code

            socket = listener.Accept();
            int bytesRec = socket.Receive(receiveBuffer);
            data = Encoding.UTF8.GetString(receiveBuffer, 0, bytesRec); 
            Debug.Log(data);

I get

  • '\u2764' for ❤
  • '\ud83d\udc4d' for

When I try the below code

            socket = listener.Accept();
            int bytesRec = socket.Receive(receiveBuffer);
            data = Encoding.Unicode.GetString(receiveBuffer, 0, bytesRec); 
            Debug.Log(data);

I get

  • '畜㜲㐶' for ❤
  • '畜㡤搳畜捤搴' for

Thanks Jos

  • Afaik that depends on the font you are using .. – derHugo May 01 '21 at 20:20
  • I had to deal with this about two years ago. You can check out [my solution](https://stackoverflow.com/questions/55082644/c-sharp-regular-expression-to-find-a-surrogate-pair-of-a-unicode-codepoint-fro). It is not exactly the answer you need, but all of the information is there. It is using PHP, Regex, Websockets, and Unity/C#. – TEEBQNE May 02 '21 at 01:30
  • I have written a PowerShell script to convert string literal-like expressions to characters (e.g. `'\x64'`, `'\u2764'`, `'\U0001F602'` or `'\ud83d\udc4d'` to `d`, `❤`, `` and ``, respectively) . Unfortunately, my C# compiler is currently broken. I could post my Posh solution if you want… Based on `.NET` so converting it to `C#` should be easy… – JosefZ May 02 '21 at 09:53
  • @derHugo: Let me check it out with different fonts... – Tamil Ninja May 02 '21 at 14:20
  • @teebqne: I will go through it now... – Tamil Ninja May 02 '21 at 14:21
  • @josefz: Please do share.. thank you. it might be helpful – Tamil Ninja May 02 '21 at 14:21

1 Answers1

0

The code, HTH:

Function Get-PythonString {
    [CmdletBinding()]
    [OutputType([System.String],[System.Int32[]])]
param(
    [Parameter(Position=0, Mandatory, ValueFromPipeline)] [String]$pyStr='',
    [Parameter()] [Switch] $AsArray
)
    $retArr = [System.Collections.ArrayList]::new()
    $retStr = ''
    $highSur= ''
    $i=0
    while ( $i -lt $pyStr.Length ) {
        if ( $pyStr.Chars($i) -eq '\' ) {
            if ( $pyStr.Chars($i +1 ) -ceq 'U' ) {
                $iAux = [int]("0x" + ($pyStr.Substring( $i+2, 8)))
                $i += 10
            } elseif ( $pyStr.Chars($i +1 ) -ceq 'u' ) {
                $iAux = [int]("0x" + ($pyStr.Substring( $i+2, 4)))
                $i += 6
            } elseif ( $pyStr.Chars($i +1 ) -ceq 'x' ) {
                $iAux = [int]("0x" + ($pyStr.Substring( $i+2, 2)))
                $i += 4
            } else {
                $iAux = [int]$pyStr.Chars( $i )
                $i++
            }
        } else {
            $iAux = [int]$pyStr.Chars( $i )
            $i++
        }
        if ( $iAux -gt 0xFFFF ) { # out of BMP } {
            [void]$retArr.Add( [int]$iAux)
            $retStr += [char]::ConvertFromUtf32( $iAux)
        } else {
            if ( [char]::IsHighSurrogate( [char]$iAux )) {
                $highSur = [char]$iAux
            } else {
                if ( [char]::IsLowSurrogate( [char]$iAux )) {
                    $iAux = [int][char]::ConvertToUtf32( $highSur, [char]$iAux)
                    $highSur = ''
                }
                [void]$retArr.Add( [int]$iAux)
                $retStr += [char]::ConvertFromUtf32( $iAux)
            }
        }
    }
    if ($AsArray.IsPresent) {
        $retArr
    } else {
        $retStr
    }
}

Escape sequences only recognized in Python string literals are:

Escape Sequence  Meaning
\xnn             Character with  8-bit hex value nn
\unnnn           Character with 16-bit hex value nnnn
\Unnnnnnnn       Character with 32-bit hex value nnnnnnnn
\N{name}         Character named name in the Unicode database (not implemented)

Usage examples:

Get-PythonString -pyStr "\x65\x78\x61\x6D\x70\x6C\x65, \u2764, \U0001F602 or \ud83d\udc4d"
example, ❤,  or 
"\x65\x78\x61\x6D\x70\x6C\x65, \u2764, \U0001F602 or \ud83d\udc4d"|Get-PythonString
example, ❤,  or 
JosefZ
  • 28,460
  • 5
  • 44
  • 83