8

I have the following code

use utf8;
open($file, '>:encoding(UTF-8)', "さっちゃん.txt") or die $!;
print $file "さっちゃん";

But I get the file name as ã•ã£ã¡ã‚ƒã‚“.txt

I was wondering if there was a way of making this work as I would expect (meaning I have a unicode file name) this without resorting to Win32::API, Win32API::* or moving to another platform and using a Samba share to modify the files.

The intent is to ensure we do not have any Win32 specific modules that need to be loaded (even conditionally).

Archimedes Trajano
  • 35,625
  • 19
  • 175
  • 265
  • 1
    it works pretty fine on my side (windows XP, cygwin perl 5.10). Are sure you have problems with perl and not with some other things? Do you really save the source in UTF-8 encoding? – n0rd May 13 '11 at 13:56
  • possible duplicate of [What is the universal way to use file I/O API with unicode filenames?](http://stackoverflow.com/questions/2796127/what-is-the-universal-way-to-use-file-i-o-api-with-unicode-filenames) – daxim May 13 '11 at 14:36
  • @n0rd I am using ActiveState Perl rather than Cygwin – Archimedes Trajano May 14 '11 at 13:53
  • 1
    yup, tried to run it on ActivePerl, and it creates file with garbled name. – n0rd May 14 '11 at 20:24

3 Answers3

9

Perl treats file names as opaque strings of bytes. They need to be encoded as per your "locale"'s encoding (ANSI code page).

In Windows, this is is usually cp1252. It is returned by the GetACP system call. (Prepend "cp"). However, cp1252 doesn't support Japanese characters.

Windows also provides a "Unicode" aka "Wide" interface, but Perl doesn't provide access to it using builtins*. Win32::LongPath uses this wide interface, so you could use its functions instead of the builtins to avoid encoding-related constraints.

* — Perl's support for Windows sucks in some respects.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • That is correct encoding cp1252 yields ? for Japanese characters, thus it makes it an invalid Windows file name as ? file name character. – Archimedes Trajano May 14 '11 at 14:05
  • 1
    @Archimedes Trajano, you can configure encode to return something other than "?", so you could create *valid* Windows file name. However, you can't create the file name you *want* using `CreateFileA` (what Perl uses). You have to use `CreateFileW`, and Win32API::File provides access to it. – ikegami May 14 '11 at 21:20
  • @ikegami that's correct, but that's why I stated in my original question to not use Win32API stuff nor use a remote Samba share. – Archimedes Trajano May 15 '11 at 01:05
  • 1
    @Archimedes Trajano, You said not to use Win32::API. Despite the similar name, Win32API::File is completely unrelated. For the third time, Perl builtins don't use `CreateFileW`, and you need to use `CreateFileW`. As such, you need an XS module that provides access to `CreateFileW`, and Win32::API (with some extra work) and Win32API::File (without extra work) are such modules – ikegami May 15 '11 at 06:33
  • I have rephrased the question to ensure that Win32 specific modules get considered. The intent of the question is to ensure we do not have any Win32 specific modules that need to be loaded (even conditionally). – Archimedes Trajano Jun 05 '11 at 05:14
  • @Archimedes Trajano, ok? The answer doesn't change. You need the `CreateFileW` system call, and it's provided as `CreateFileW` by Win32API::File. – ikegami Jun 05 '11 at 07:09
  • @ikegami, thank you for the clarification. I've got the similar question the OP posted. Never figured out why until now. Thank you again :) – Mike Jun 05 '11 at 11:21
1

Use Encode::Locale:

use utf8;
use Encode::Locale;
use Encode;

open($file, '>:encoding(UTF-8)', encode(locale_fs => "さっちゃん.txt") ) or die $!;
print $file "さっちゃん";
godegisel
  • 37
  • 2
1

The following produces a unicoded file name on Windows 7 using Activestate Perl.

#-----------------------------------------------------------------------
# Unicode file names on Windows using Perl
# Philip R Brenan at gmail dot com, Appa Apps Ltd, 2013
#-----------------------------------------------------------------------

use feature ":5.16";
use Data::Dump qw(dump);
use Encode qw/encode decode/;
use Win32API::File qw(:ALL);

# Create a file with a unicode name

my $e  = "\x{05E7}\x{05EA}\x{05E7}\x{05D5}\x{05D5}\x{05D4}".
         "\x{002E}\x{0064}\x{0061}\x{0074}\x{0061}"; # File name in UTF-8
my $f  = encode("UTF-16LE", $e);  # Format supported by NTFS 
my $g  = eval dump($f);           # Remove UTF ness
   $g .= chr(0).chr(0);           # 0 terminate string
my $F  = Win32API::File::CreateFileW
 ($g, GENERIC_WRITE, 0, [], OPEN_ALWAYS, 0, 0); # Create file via Win32API
say $^E if $^E;                   # Write any error message

# Write to the file

OsFHandleOpen(FILE, $F, "w") or die "Cannot open file";
binmode FILE;                       
print FILE "hello there\n";       
close(FILE);