File manipulation in Perl on Windows (Unicode characters in file name)
In Perl on Windows, I use Win32::Unicode
, Win32::Unicode::File
and Win32::Unicode::Dir
. They work perfectly with Unicode characters in file names.
Just mind that Win32::Unicode::File::open()
(and new()
) have a reversed argument order compared Perl's built-in open()
- mode comes first.
You do not need to encode the characters manually - just insert them as they are (if your Perl script is in UTF-8), or using the \x{N}
notation.
Printing out Unicode characters on Windows
Printing Unicode into console on Windows is another problem. You can't use cmd.exe
. Instead use PowerShell ISE. The drawback of the ISE is that it's not a console - scripts can't take input from keyboard thru STDIN
.
To get Unicode output, you need to do set the output encoding to UTF-8 in every PowerShell ISE that's started. I suggest doing so in the startup script.
Procedure to have PowerShell ISE default to Unicode output:
1) In order for any user PowerShell scripts to be allowed to run, you first need to do:
Set-ExecutionPolicy RemoteSigned
2) Edit or create your Documents\WindowsPowerShell\Microsoft.PowerShellISE_profile.ps1
to something like:
perl -w -e "print qq!Initializing the console with Perl...\n!;"
[System.Console]::OutputEncoding = [System.Text.Encoding]::UTF8;
The short Perl command is there as a trick to allow the System.Console
property be modified. Without it, you get an error when setting the OutputEncoding
.
If I recall correctly, you also have to change the font to Consolas.
Even when the Unicode characters print out fine, you may have trouble including them in command line arguments. In these cases I've found the \x{N}
notation works. The Windows Character Map utility is your friend here.
(Edited heavily after I rediscovered the regular PowerShell's inability to display most Unicode characters, with references to PowerShell (non-ISE) removed. Now I remember why I started using the ISE...)