1

How can I write a Perl module with UTF-8 in its name and filename? My current try yields "Can't locate Täst.pm in @INC", but the file does exist. I'm on Windows, and haven't tried this on Linux yet.

test.pl:

use strict;
use warnings;
use utf8;
use Täst;

Täst.pm:

package Täst;
use utf8;

Update: My current work-around it so use Tast (ASCII) and put package Täst (Unicode) in Tast.pm (ASCII). It's confusing, though.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Tim
  • 13,904
  • 10
  • 69
  • 101
  • 1
    and you're also sure that your editor does UTF-8, and you have done chcp 65001 in the cmd window? – Ingo Apr 19 '11 at 14:46
  • 3
    You are really brave! :) – w.k Apr 19 '11 at 14:52
  • 1
    Your workaround will work for a pure OOP module, but it does mean that Täst won't be able to export any non-OO functions - `use Tast` will look for `Tast::import()`, and since the package name is Täst, won't find it. I've seen the same thing happen when Windows & Mac users `use cgi` in their scripts - perl finds the .pm, because the file system is case-insensitive, but it can't find a package named cgi, so fails to import any symbols from it. – Sherm Pendley Apr 19 '11 at 16:29

4 Answers4

3

Unfortunately, Perl, Windows, and Unicode filenames really don't go together at the moment. My advice is to save yourself a lot of hassle and stick with plain ASCII for your module names. This blog post mentions a few of the problems.

cjm
  • 61,471
  • 9
  • 126
  • 175
  • 1
    Also see: the [箆](http://backpan.perl.org/authors/id/M/MA/MARCEL/-0.01.tar.gz) distribution, the [Acme::Ünicöde](http://search.cpan.org/dist/Acme-Uenicoede/) distribution for the impact on the whole toolchain. – daxim Apr 20 '11 at 10:35
3

The use utf8 needs to appear before the package Täst, so that the latter can be correctly interpreted. On my Mac:

test.pl:

#!/usr/bin/perl

use strict;
use warnings;

use utf8;
use Tëst;

# 'use utf8' only indicates the code's encoding, but we also want stdout to be utf8
use encoding "utf8";

Tëst::hëllö();

Tëst.pm:

use utf8;
package Tëst;

sub Tëst::hëllö() {
    print "Hëllö, wörld!\n";
}

1;

Output:

Macintosh:Desktop sherm$ ./test.pl 
Hëllö, wörld!

As I said though - I ran this on my Mac. As cjm said above, your mileage may vary on Windows.

Sherm Pendley
  • 13,556
  • 3
  • 45
  • 57
  • It doesn't work for me, but it's good to know that it is possible under some OS. – Tim Apr 19 '11 at 15:55
  • I think that's basically what it comes down to - **Perl** can cope, but it relies on the underlying OS & filesystem also being able to. – Sherm Pendley Apr 19 '11 at 15:59
  • OK. Thanks for noting the order of `use utf8`, by the way; I thought the pragma was always moved to a `BEGIN` block. – Tim Apr 19 '11 at 16:03
  • It is, but package *also* takes effect at compile time. When you have two statements that both execute at compile time, they're executed in order. – Sherm Pendley Apr 19 '11 at 16:09
2

Unicode support often fails at the boundaries. Package and subroutine names need to map cleanly onto filenames, which is problematic on some operating systems. Not only does the OS have to create the filename that you expect, but you also have to be able to find it later as the same name.

We talked a little about the filename issue in Effective Perl Programming, but I also summarized much more in How do I create then use long Windows paths from Perl?. Jeff Atwood mentions this as part of his post on his Filesystem Paths: How Long is Too Long?.

Community
  • 1
  • 1
brian d foy
  • 129,424
  • 31
  • 207
  • 592
1

I wouldn't recommend this approach if this is software you plan to release, to be honest. Even if you get it working fine for you, it's likely to be somewhat fragile on machines where UTF-8 isn't configured quite right, and/or filenames may not contain UTF-8 characters, etc.

David Precious
  • 6,544
  • 1
  • 24
  • 31