0

I've written this below.

It successfully reverses a string without garbling up any emoji characters.

The only trouble is that it doesn't reverse the emojis which is what I was hoping to do.

Here's the code I have with some examples:

Function Out-ReverseString {

  [cmdletbinding()]
  Param (
    [Parameter(Mandatory = $true, HelpMessage = 'Passed string to reverse')]
    [string]$StringInput
    )

  $returnedNewClip = -join[regex]::Matches($StringInput, '([^\x00-\x7F]+|.)', 'RightToLeft')
  return $returnedNewClip

}

Out-ReverseString -StringInput "'What ‍‍ is this?' ‍‍ "  # Returns:  ‍‍ '?siht si ‍‍ tahW'
Out-ReverseString -StringInput ".sijome ekil  I"        # Returns: I  like emojis.
Out-ReverseString -StringInput ". ekil t'nseod ?néiuQ¿"      # Returns: ¿Quién? doesn't like .
Out-ReverseString -StringInput "195981348903269-335"           # Returns: 533-962309843189591
Out-ReverseString -String "¿Quién doesn't like ?"            # Returns: ? ekil t'nseod néiuQ¿
Ste
  • 1,729
  • 1
  • 17
  • 27

2 Answers2

3

The problem, as commented, is caused by the fact that emojis are multi-byte characters.

Cat Face with Tears of Joy Emoji U+1F639 in UTF-8 are F0 9F 98 B9.

Hugging Face Emoji U+1F917 in UTF-8 are F0 9F A4 97.

To reverse a multibyte character string, one needs to use a grapheme aware iterator, which understands that an emoji consists of several bytes unlike the usual latin alphabet. There's TextElementEnumerartor which does exactly that. Iterate the string with it to get graphemes, not raw bytes, and reverse the result. There's an old answer, let's convert it into Powershell like so,

# Load the globalization assembly for later use
Add-Type -AssemblyName System.Globalization
$str = "'What ‍‍ is this?' ‍‍ "
$se = [System.Globalization.StringInfo]::GetTextElementEnumerator($str)
$ll = @()
while($se.MoveNext()) { $ll += $se.GetTextElement() }
[array]::reverse($ll)
$rev =  $ll -join ''
$str
'What ‍‍ is this?' ‍‍ 
$rev
 ‍‍ '?siht si ‍‍ tahW'
vonPryz
  • 22,996
  • 7
  • 54
  • 65
  • Thanks very much for that. Where did you test this because in PowerShell ISE I get ` ‍‍ '?siht si ‍‍ tahW'` The `‍‍` aren't joined to get the `‍‍`. – Ste Apr 19 '21 at 17:41
  • @Ste Powershell Core 7.1.0 on MacOS. – vonPryz Apr 19 '21 at 18:03
  • With 5.1 is ISE on Windows I get those above. I wonder if something has changed with those methods since. – Ste Apr 19 '21 at 18:30
  • @Ste If you run it on command line version of Powershell instead of ISE, does it work any better? – vonPryz Apr 19 '21 at 18:36
  • When I pate this `"'What ‍‍ is this?' ‍‍ "` I get `'What ???????? is this?' ???????? ????"` so the command line doesn't understand those characters. – Ste Apr 19 '21 at 18:43
  • I've tested this exact code in a 7.1 portable build of PS and it returns the correct result. It's the line `[System.Globalization.StringInfo]::GetTextElementEnumerator($str)` which in 5.1 doesn't recognise the `‍‍`. – Ste Apr 20 '21 at 13:19
  • 5.1 needs utf8 *with* bom encoding in the script. – js2010 Apr 20 '21 at 18:14
  • @js2010, sorry I missed your reply. Yes, utf8 with bom is what it's saved as. – Ste Apr 23 '21 at 10:34
2

Powershell 7 has an enumeraterunes() for strings:

$emojis = ''
$a = $emojis.EnumerateRunes() | % { "$_" }
-join $a[$a.length..0]


js2010
  • 23,033
  • 6
  • 64
  • 66
  • Thanks for that but I'm using 5.1. That's a great answer otherwise. – Ste Apr 19 '21 at 17:29
  • `EnumerateRunes()` enumerates Unicode codepoints and not Unicode graphemes so it won't work for composing or joining, only for simple multi-bytes characters like you showcased. – noraj Jun 15 '23 at 09:38