11

I'm a Windows and Powershell noobie. I'm coming from Linux Land. I used to have this little Bash function in my .bashrc that would copy a "shruggie" (¯\_(ツ)_/¯) to the clipboard for me so that I could paste it into conversations on Slack and such.

My Bash alias looked like this: alias shruggie='printf "¯\_(ツ)_/¯" | xclip -selection c && echo "¯\_(ツ)_/¯"'

I realize that this question is juvenile, but the answer does have value to me as I'm sure that I will need to pipe odd UTF-8 characters to output in a Powershell script at some point in the future.

I wrote this function in my PowerShell profile:

function shruggie() {
  '¯\_(ツ)_/¯' | clip
  Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}

However, this gives me: ??\_(???)_/?? (Unknown UTF-8 chars are converted to ?) when I call it on the command line.

I've looked at [System.Text.Encoding]::UTF8 and some other questions but I don't know how to cast my string as UTF-8 and pass that through clip.exe and receive UTF-8 out on the other side (on the clipboard).

jonathanbell
  • 2,507
  • 5
  • 23
  • 40
  • 6
  • `Set-Clipboard : The term 'Set-Clipboard' is not recognized as the name of a cmdlet, function, script file, or operable program` ... PowerShell hates me? ¯\\_(ツ)_/¯ – jonathanbell Dec 29 '17 at 01:00
  • 1
    Update your PowerShell to v5 – Maximilian Burszley Dec 29 '17 at 01:01
  • On PowerShell version 5.1 `'¯\_(ツ)_/¯' | Set-Clipboard` produces `¯\_(ツ)_/¯` So, still not straight UTF-8 passing through. Wonder how PowerShell < v5 handled this. – jonathanbell Dec 29 '17 at 03:48
  • Give `-AsHtml` a shot. It sounds like your profile encoding is wrong. – Maximilian Burszley Dec 29 '17 at 03:55
  • Also, the powershell.exe executable (traditionally, the console) is still relying off some older cmd.exe engine and doesn't like the full unicode spec (even in v5.1) [Here's additionally reading for the curious](https://stackoverflow.com/questions/5796339/printing-unicode-characters-to-the-powershell-prompt) – Maximilian Burszley Dec 29 '17 at 03:59
  • Wow... Maybe this isn't possible then?! Are we really still writing with Code Pages specs in PowerShell?! Perhaps! Doing this first in the PowerShell console: `[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(932)` and then `Set-Clipboard '¯\_(ツ)_/¯'` works! Seems that the [932 Code Page spec](https://en.wikipedia.org/wiki/Code_page_932_(Microsoft_Windows)) is working here! Yikes! So, does PowerShell really not have full UTF-8 support? Seems so, if I am reading your link correctly. – jonathanbell Dec 29 '17 at 04:35
  • 2
    .NET does not support UTF-8 for strings. All strings are sequence of 16-bit `char` elements. Also PowerShell do not recognize UTF-8 encoded script file unless it have UTF-8 BOM in it. So, `'¯\_(ツ)_/¯' | Set-Clipboard` work fine in PowerShell v5+ (try it in console directly), you just have script file which by PowerShell understanding is not UTF-8, thus it using `[Text.Encoding]::Default` to read it. – user4003407 Dec 29 '17 at 04:47
  • 2
    @PetSerAl Yes! Setting UTF-8 BOM helped me! Thank you! I didn't know I needed BOM (coming from UNIX world). – jonathanbell Dec 29 '17 at 05:37
  • 2
    powershell.exe and cmd.exe are shells, not consoles or terminals, and the Windows console isn't a cmd.exe engine. Prior to Windows 7 each console is hosted in the subsystem server process, csrss.exe. Windows 7+ hosts each console in an instance of conhost.exe. Windows 8 added the condrv.sys device driver and rewrote the API implementation. Windows 10 split the console host implementation into the legacy conhostv1.dll and the new conhostv2.dll, with many new features in the new console. – Eryk Sun Dec 29 '17 at 10:46

3 Answers3

12

There are two distinct, independent aspects:

  • copying ¯\_(ツ)_/¯ to the clipboard, using clip.exe
  • writing (echoing) ¯\_(ツ)_/¯ to the console

Prerequisite: PowerShell must properly recognize your source code's encoding in order for the solutions below to work: if your source code is UTF-8-encoded, be sure to save the enclosing files as UTF-8 with BOM for Windows PowerShell to recognize it.

  • Windows PowerShell, in the absence of BOM, interprets source as "ANSI"-encoded, referring to the legacy, single-byte, extended-ASCII code page in effect, such as Windows-1252 on US-English system, and would therefore interpret UTF-8-encoded source code incorrectly.

  • Note that, by contrast, PowerShell Core uses UTF-8 as the default, so the BOM is no longer necessary (but still recognized).


Copying ¯\_(ツ)_/¯ to the clipboard, using clip.exe:

  • In Windows PowerShell v5.1+, you can use the built-in Set-Clipboard cmdlet to copy text to the clipboard from within PowerShell; given that PowerShell uses the .NET System.String type that is capable of representing all Unicode characters, there are no encoding issues.

    • Note that PowerShell Core, even when run on Windows, does NOT have this cmdlet (as of PowerShell Core v6.0.0-rc.2)
    • See this answer of mine for clipboard functions that work in earlier PowerShell versions as well as in PowerShell Core.
  • In earlier versions of Windows PowerShell and in PowerShell Core, use of clip.exe is a viable alternative, but its use requires additional work:

function shruggie() {
  $OutputEncoding = (New-Object System.Text.UnicodeEncoding $False, $False).psobject.BaseObject
  '¯\_(ツ)_/¯' | clip
  Write-Verbose -Verbose "Shruggie copied to clipboard." # see section about console output
}
  • New-Object System.Text.UnicodeEncoding $False, $False creates a BOM-less UTF16-LE encoding, which clip.exe understands.

    • The magic .psobject.BaseObject incantation is, unfortunately, required to work around a bug; in PSv5+, you can bypass this bug by using the following instead:
      [System.Text.UnicodeEncoding]::new($False, $False)
  • Assigning that encoding to preference variable $OutputEncoding ensures that PowerShell uses that encoding to pipe data to external utility clip.exe.


Writing ¯\_(ツ)_/¯ to the console:

Note: PowerShell Core on Unix platforms generally uses consoles (terminals) with a default encoding of (BOM-less) UTF-8, so no additional work is needed there.

To merely echo (print) Unicode characters (beyond the 8-bit range), it is sufficient to switch to a font that can display Unicode characters (beyond the extended ASCII range), because, as PetSerAl points out, PowerShell uses the Unicode version of the WriteConsole Windows API function to print to the console.

To support (most) Unicode characters, you most switch to one of the "TT" (TrueType) fonts.

PetSerAl points out in a comment that console windows on Windows are currently limited to a single 16-bit code unit per output character (cell); given that only (most of) the characters in the BMP (Basic Multilingual Plane) are self-contained 16-bit code units, the (rare) characters beyond the BMP cannot be represented.

Sadly, even that may not be enough for some (BMP) Unicode characters, given that the Unicode standard is versioned and font representations / implementations may lag.

Indeed, as of Windows 10 release ID 1703, only a select few fonts can render (Unicode character KATAKANA LETTER TU, U+30C4, UTF-8: E3 83 84):

  • MS Gothic
  • NSimSum

Note that if you want to (also) change how other applications interpret such output, you must again set $OutputEncoding:

For instance, to make PowerShell expect UTF-8 input from external utilities as well as output UTF-8-encoded data to external utilities, use the following:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

The above implicitly changes the code page to 65001 (UTF-8), as reflected in chcp (chcp.com).

Note that, for backward compatibility, Windows console windows still default to the single-byte, extended-ASCII legacy OEM code page, such as 437 on US-English systems.

Unfortunately, as of v6.0.0-rc.2, this also applies to PowerShell Core, even though it has otherwise switched to BOM-less UTF-8 as the default encoding, as also reflected in $OutputEncoding.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • This is such a well researched answer, I'm blown away! Thank you for taking the time. Interesting what I've learned here! – jonathanbell Dec 30 '17 at 01:51
  • 1
    Also, if I'm reading [this](https://stackoverflow.com/a/40098904/1171790) right, it looks like PowerShell version 6 will be BOM-less, even on Windows. – jonathanbell Dec 30 '17 at 02:04
  • 2
    @jonathanbell: I'm glad to hear it, but please see my update based on PetSerAl's feedback. Re BOM-less UTF-8: indeed, thank god. – mklement0 Dec 30 '17 at 03:32
  • 2
    Wow, what a mess. :) – wp78de Dec 30 '17 at 04:29
  • For the copy to clipboard in PS, I had to change the encoding to `utf8-bom` and then use `|set-clipboard` and voila. – Timo Mar 29 '22 at 13:22
4

If you cannot use PowerShell 5's Set-Clipboard function (which is IMO the go-to solution) you can convert/encode your output in a way that clip.exe understands it correctly.

There are two ways to achieve what want here:

  1. Feed clip.exe with a UTF-16 file: clip < UTF16-Shruggie.txt
    The important part here is to save the file encoded as: Unicode (which means UTF-16 format little-endian byte order with BOM)
  2. Encode the string appropriately (the following part works in a PoSh editor like ISE but unfortunately not in a regular console, see mklment0s answer how to achieve this):
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
function shruggie() {

  [System.Text.Encoding]::Default.GetString(
    [System.Text.Encoding]::UTF8.GetBytes('¯\_(ツ)_/¯')
) | clip.exe
  Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
shruggie

This works for me. Here is an MSDN blog post that gives further explanations about $OutputEncoding/[Console]::OutputEncoding.

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Solution 2. won't work in _regular console windows_, because `$OutputEncoding` - the preference variable PowerShell uses to determine the encoding to use to send strings to external utilities - defaults to _ASCII_, so any character with the high bit set will be represented as _literal_ `?` (the _ISE_ is a different story). The caveat with respect to solution 1. is that if the file starts with a _BOM_ (as is added by default with PowerShell's `>` operator, for instance), that BOM is _included_ in the data copied to the clipboard, which is undesired. – mklement0 Dec 30 '17 at 00:07
  • You are right, the second method works in a PowerShell editor but not in the regular console as supposed. I've tested this in ISE and there it works fine. – wp78de Dec 30 '17 at 04:27
2

The post Set-Clipbord option is the most direct answer, but as noted a PoSHv5 and higher thing. However, depending on what OS he the OP is on, not all cmdlets are available on all OS/PoSH versions. This is not to say that Set-Clipboard is not, but since the OP says they're new, it's just a heads up.

If you can't go there for whatever reason, you can create your own and or use add-on modules. See this post:

Convert Keith Hill's PowerShell Get-Clipboard and Set-Clipboard to a PSM1 script

The results from using the Set-Clipboard function from the above post and modifying the OP's post for its use:

(Get-CimInstance -ClassName Win32_OperatingSystem).Caption
Microsoft Windows Server 2012 R2 Standard

$PSVersionTable

Name                           Value                                                                                                                    
----                           -----                                                                                                                    
PSVersion                      4.0                                                                                                                      
WSManStackVersion              3.0                                                                                                                      
SerializationVersion           1.1.0.1                                                                                                                  
CLRVersion                     4.0.30319.42000                                                                                                          
BuildVersion                   6.3.9600.18773                                                                                                           
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0}                                                                                                     
PSRemotingProtocolVersion      2.2                                                                                                                      



function Set-ClipBoard 
{
    Param
    (
        [Parameter(ValueFromPipeline=$true)]
        [string] $text
    )
    Add-Type -AssemblyName System.Windows.Forms
    $tb = New-Object System.Windows.Forms.TextBox
    $tb.Multiline = $true
    $tb.Text = $text
    $tb.SelectAll()
    $tb.Copy()
}

function New-Shruggie
{
    Set-ClipBoard -text '¯\_(ツ)_/¯'
    Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}

New-Shruggie

¯\_(ツ)_/¯ copied to clipboard.

Results pasted from clipboard

¯\_(ツ)_/¯

There are options however, such as the following, but the above are still the best route.

First remember that output is controlled by the OS codepage and the interpreter (PoSH) and both default to ASCII.

You can see the PoSH default CP settings by looking at the output of the built-in variable

$OutputEncoding

As per the PoSH creator Jeffery Snover says: The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.
Some do, most don’t.

So, all that being said ... You can change the CodePage, by doing items like...

[Console]::OutputEncoding

Or ...

$OutputEncoding = New-Object -typename System.Text.UTF8Encoding

If sending out put to a file...

$OutPutData | Out-File $outFile -Encoding UTF8
postanote
  • 15,138
  • 2
  • 14
  • 25
  • 1
    I see! Thank you for the detailed answer! I'm not interested in changing the CodePage in my PowerShell profile so I think that (for my needs) I will stick with [this](https://artofshell.com/2016/04/emojis-in-powershell-yes/). Again, thanks for the detailed answer. – jonathanbell Dec 29 '17 at 05:29