1

I have a perl string containing Unicode characters and I want to create a file with this string as a filename. It should work on Windows, Linux and Mac whatever the locale used. Here is my code:

use strict;
use warnings FATAL => 'all';

use Encode::Locale;
use Encode;

# ファイル.c
my $file = "\x{30D5}\x{30A1}\x{30A4}\x{30EB}.c";

$file = encode(locale_fs => $file);

open(my $filehdl, '>', $file) or die("Unable to create file: $!");
close($filehdl);

I use encode function because, according to this answer:

Perl treats file names as opaque strings of bytes. They need to be encoded as per your "locale"'s encoding (ANSI code page).

However, this code fails with the following error:

Unable to create file: Invalid argument at .\perl.pl line 15.

I took a deeper look on how the string is encoded by encode:

my $rep = sprintf '%v02X', $file;
print($rep);

This prints:

3F.3F.3F.3F.2E.63

In my current locale (CP-1252), it corresponds to ????.c. We can see that each Unicode characters has been replaced by a question mark. I think it is normal to have question marks here because the characters in my string are not representable using CP-1252 encoding.

So, my question is: is there a way to create a file with a name containing Unicode characters?

Pierre
  • 1,942
  • 3
  • 23
  • 43
  • 2
    Just, you cannot. There is no guarantee that filesystem supports Unicode, and every OS has special cases on what characters are allowed. We are not ready for it. – Giacomo Catenazzi Aug 26 '21 at 15:30
  • @GiacomoCatenazzi You mean that even if I'm sure my OS can create the file (which is the case if I use directly the File Explorer), I can't write a portable code for doing that, right? – Pierre Aug 26 '21 at 15:31
  • It's not clear what you imagine that should mean. If the OS and the filesystem support UTF-8, using that should be straightforward; but of course, the assumption that they do isn't portable. – tripleee Aug 26 '21 at 15:43
  • I would be surprised it there is a portable way (and portable also within the same OS, but different users, different file systems [e.g. USB stick, external disks]), etc.). But let's see if somebody have a solution. – Giacomo Catenazzi Aug 26 '21 at 15:45
  • 2
    If you are on Windows, see also [Win32::Unicode::File](https://metacpan.org/pod/Win32::Unicode::File) and [this](https://stackoverflow.com/q/62318215/2173773) question – Håkon Hægland Aug 26 '21 at 15:49
  • Some background information. We used 8-bit encodings, so operating system may accept files with legacy encoding. Then we had Unicode, and filesystem started using with UCS-2, but this requires a new API. UTF-8 is relatively new (and it is not default in Windows), so it is difficult to know if you put a 8-bit, if it will be encoded in UTF-16, using local 8-bit encoding, or UTF-8). In short: an API mess (especially on Windows side). Compatibility vs. Unicode world – Giacomo Catenazzi Aug 26 '21 at 15:50
  • In Windows which supports CP1252 you have to generate filename in this encoding -- otherwise you will see unprintable characters in the name. Further more CP1252 code page does not cover whole character range covered by UTF-8 -- due this reason we have various versions of Windows (for example Asian version). And external media as USB drives and external disks can have filesystems with completely different filesystem supporting different encoding which MS Windows does not understand. – Polar Bear Aug 26 '21 at 19:00
  • @PolarBear: OTOH NTFS (and moderner FS, but also the FAT with long names) supports UTF-16. @ Pierre: some background information: https://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as as you see it is more a problem of API, and I think perl (and most portable programs just use portable functions, so no guarantee of unicode support). – Giacomo Catenazzi Aug 27 '21 at 06:33

1 Answers1

2

For Windows there is a module Win32::LongPath, which not only allows long file names, but also unicode characters.

I wrote myself a module for all kinds of file and dir IO that I need, that on Windows uses these module's functions, and else the standard perl ones, like so:

use Carp;
use Fcntl qw( :flock :seek );
use constant USE_LONG => ($^O =~ /Win/i) ? 1 : 0;
use if USE_LONG, 'Win32::LongPath', ':funcs';

sub open
{
    my $f       = shift; # file
    my $m       = shift;    # mode
    my $l       = @_ ? (shift) : 'utf8';    # encoding
    my $lock    = $m eq '<' ? LOCK_SH : LOCK_EX;
    length $l
        and $m .= ":$l";
    my $h;
    USE_LONG ? openL( \$h, $m, $f ) : open( $h, $m, $f ) # openL needs REF on Handle!
        or confess "Can't open file: '$f' ($^E)";
    flock( $h, $lock );
    return $h;
}

That way the code is portable. It runs on a Linux server as well as on my Windows PC at home.

Sadko
  • 166
  • 9