70

How can I encode the Unicode character U+0048 (H), say, in a PowerShell string?

In C# I would just do this: "\u0048", but that doesn't appear to work in PowerShell.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
dan-gph
  • 16,301
  • 12
  • 61
  • 79

7 Answers7

93

Replace '\u' with '0x' and cast it to System.Char:

PS > [char]0x0048
H

You can also use the "$()" syntax to embed a Unicode character into a string:

PS > "Acme$([char]0x2122) Company"
AcmeT Company

Where T is PowerShell's representation of the character for non-registered trademarks.

Note: this method works only for characters in Plane 0, the BMP (Basic Multilingual Plane), chars < U+10000.

noraj
  • 3,964
  • 1
  • 30
  • 38
Shay Levy
  • 121,444
  • 32
  • 184
  • 206
  • 4
    You can even write a little function: function C($n) {[char][int]"0x$n"}. Which you can use in a string as follows: "$(C 48)ello World." Not ideal but probably a little closer to the \u escape. – Joey Jun 29 '09 at 09:29
  • This also works when you want to pass a unicode [char] to a function. Thanks for the help. – Sonamor Aug 21 '18 at 11:07
  • 1
    I know this topic is 2.5 years old, but following up on @Joey's comment, you can even make a function called `\u`. It's identical to Joey's, just with a different name. So the function is `function \u($n) {[char][int]"0x$n"}`. The way you call it is just like C# except that you need a space between the function name and the number. So `\u 0048` returns `H`. – chris Mar 10 '21 at 00:13
  • This only works for characters in BMP, else it triggers an error. Eg. `[char]0x1D400`: `InvalidArgument: Cannot convert value "119808" to type "System.Char". Error: "Value was either too large or too small for a character."` – noraj May 27 '22 at 15:25
28

According to the documentation, PowerShell Core 6.0 adds support with this escape sequence:

PS> "`u{0048}"
H

see https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_special_characters?view=powershell-6#unicode-character-ux

mclayton
  • 8,025
  • 2
  • 21
  • 26
17

Maybe this isn't the PowerShell way, but this is what I do. I find it to be cleaner.

[regex]::Unescape("\u0048") # Prints H
[regex]::Unescape("\u0048ello") # Prints Hello
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Kevin Buchan
  • 2,790
  • 3
  • 27
  • 34
4

For those of us still on 5.1 and wanting to use the higher-order Unicode charset (for which none of these answers work) I made this function so you can simply build strings like so:

'this is my favourite park ',0x1F3DE,'. It is pretty sweet ',0x1F60A | Unicode

enter image description here

#takes in a stream of strings and integers,
#where integers are unicode codepoints,
#and concatenates these into valid UTF16
Function Unicode {
    Begin {
        $output=[System.Text.StringBuilder]::new()
    }
    Process {
        $output.Append($(
            if ($_ -is [int]) { [char]::ConvertFromUtf32($_) }
            else { [string]$_ }
        )) | Out-Null
    }
    End { $output.ToString() }
}

Note that getting these to display in your console is a whole other problem, but if you're outputting to an Outlook email or a Gridview (below) it will just work (as utf16 is native for .NET interfaces).

enter image description here

This also means you can also output plain control (not necessarily unicode) characters pretty easily if you're more comfortable with decimal since you dont actually need to use the 0x (hex) syntax to make the integers. 'hello',32,'there' | Unicode would put a non-breaking space betwixt the two words, the same as if you did 0x20 instead.

Hashbrown
  • 12,091
  • 8
  • 72
  • 95
  • 1
    `[char]::ConvertFromUtf32` has been available since .NET 2.1 so you don't need such a complex function – phuclv Mar 24 '20 at 05:39
  • oh neat. The function is still necessary, I'm not writing `[char]blahblahblah` whenever I want a `"\`u{}"`, but it does simplify the `if` – Hashbrown Mar 24 '20 at 06:15
  • besides `$_ -shr 11` should be used instead of `[int][math]::Floor($_ / 0x400)`, and `($_ -band 0x3FF) -bor 0xDC00` instead of `[char]($_ % 0x400 + 0xDC00)` – phuclv Mar 24 '20 at 06:16
  • I s'pose that's obvious since it was a nice even hex number, oh well. Doesn't matter now that .NET can handle the overarching problem – Hashbrown Mar 24 '20 at 06:21
3

Another way using PowerShell.

$Heart = $([char]0x2665)
$Diamond = $([char]0x2666)
$Club = $([char]0x2663)
$Spade = $([char]0x2660)
Write-Host $Heart -BackgroundColor Yellow -ForegroundColor Magenta

Use the command help Write-Host -Full to read all about it.

lit
  • 14,456
  • 10
  • 65
  • 119
  • [Shay Levy's answer above](https://stackoverflow.com/a/1056978/995714) already showed how to use `[char]0x2665`. In fact this is **far more inefficient** because you create a new subshell for each variable instead of assigning directly: `$Heart = [char]0x2665` – phuclv Oct 11 '20 at 00:53
1

To make it work for characters outside the BMP you need to use Char.ConvertFromUtf32()

'this is my favourite park ' + [char]::ConvertFromUtf32(0x1F3DE) + 
'. It is pretty sweet ' + [char]::ConvertFromUtf32(0x1F60A)
phuclv
  • 37,963
  • 15
  • 156
  • 475
0

Note that some characters like might need a "double rune" to be printed:

   PS> "C:\foo\bar\$([char]0xd83c)$([char]0xdf0e)something.txt"

Will print:

   C:\foo\bar\something.txt

You can find these "runes" here, in the "unicode escape" row:

   https://dencode.com/string
XDS
  • 3,786
  • 2
  • 36
  • 56