10

I'm already read thru the next:

but probably missed some BASIC points.

Using the

use open(:utf8);

Affects cpan modules too? E.g. when some CPAN module opens any file, it will be opened with :utf8? Is this statement TRUE? (or the open pragma is only lexically scoped?) AFAIK - it affects modules too, but in "inconsistent" way.. (probably it is a problem of the modules).

Have the open pragma effect to opendir? - what i already tried - no - i still need extra decode on all filenames coming from readdir (in addition to NFC). So, IO::Dir is something different - what open pragma doesn't covers?

Affect the open pragma sockets, pipes too? (e.g. anything what is a sort of IO::Handle ?)

All (or most) CPAN modules knows when doing i/o how they need to do it (utf8 or lattin1 or raw?) (probably not, because a simple autodie doesn't works with the open pragma... :()

In many places I can read a similar rule: Remember the canonical rule of Unicode: always encode/decode at the edges of your application. This is nice rule - but the application edge mean: my own source code. CPAN modules are (usually) behind the edge too - not only the "outer world", like system or network...

From my experiance, 3/4 of the content my short scripts (what heavily uses CPAN) contains: top declarations, and dozens of encode/decode/NFC for nearly everything...

E.g.: Even logging utilities, need explicit encoding:

use Log::Any qw($log);
use Log::Any::Adapter ('File', 'file.log');
$log->error( encode('utf-8', "tökös"));

Even, when want add tie to my code, need replace every $key $value with encoded versions.

Is this true, or i missed some really basic point in the all above doccu?

Some CPAN module handling utf8 (inside) like, JSON::XS, YAML::XS, File::Slurp.. (altough never succeeded get correct "things" from YAML::XS, pure YAML and JSON::XS works without any problems...

For some modules exists "hacks" - like DBIx::Class::ForceUTF8, Template::Stash::ForceUTF8, HTML::FillInForm::ForceUTF8 - and so, - what doesn't allow write correct application for "both" utf and non-utf world... ;(

Many CPAN modules doesn't calls internally the above 'hacked variants' - (e.g. HTML::FillInForm::ForceUTF8) but only the simple-one, so it is impossible to use them correctly with utf8... Others, silently fail.. ;(

Plack application doesn't handles utf8 logging messages without the annoying "Wide character...." ;( /modern perl :(/ and can continue ;(

From the above I "deducted" (probably wrongly) - than i MUST know and remember for every CPAN module how it is handling utf8 encoded strings and because nowhere is some "registry" - is is mostly trial/error based.

So the main question is:

While i remembering: Here is no magic bullet, but is here some good way how detect and know "utf8 ready CPAN modules" what doesn't need special encode/decode before using them?

If someone need to know, i'm using the next in my every script:

use 5.014;
use warnings;
use utf8;
use feature qw(unicode_strings);
use charnames qw(:full);
use open(:utf8); #this sometimes is bad, so using only open qw(:std :utf8);
use Encode qw(encode decode);
use Unicode::Normalize qw(NFD NFC);

Hm.. just "discovered" the utf8:all perl module what replace the readdir with version doing decode.

Community
  • 1
  • 1
clt60
  • 62,119
  • 17
  • 107
  • 194
  • 2
    Agree with your point of view, but this question looks like a duplicate of: http://stackoverflow.com/questions/6281049/how-to-test-classify-cpan-modules-for-utf8-correctness . So: here is no regstry of utf8 correct modules and a 2/3 of CPAN simple doesn't care about other encodings as ascii (latin1) and counting _modern perl_ too - e.g. Moose doesn't allow utf8 method names. – clt60 Jul 01 '13 at 17:16
  • [Recent versions](https://metacpan.org/changes/release/PJF/autodie-2.12) of [autodie](https://metacpan.org/module/autodie) **do** work with the [open](http://perldoc.perl.org/open.html) pragma. – Brad Gilbert Jul 08 '13 at 20:48

1 Answers1

5

Empahsis mine:

The open pragma serves as one of the interfaces to declare default "layers" (also known as "disciplines") for all I/O. Any two-argument open, readpipe (aka qx//) and similar operators found within the lexical scope of this pragma will use the declared defaults. Even three-argument opens may be affected by this pragma when they don't specify IO layers in MODE.

So no, it doesn't effect any code in which the pragma isn't present. A handle opened within the scope of such a pragma won't lose its layers if passed to code outside of the scope of the pragma, though.


Tests to see what a module expects:

Input

  • Test 1
    • Have the input source return a string containing a code point in 80..FF and no code point above FF.
  • Test 2
    • Have the input source return a string containing a code point above FF.

Output

  • Test 1
    • Output a string containing a code point in 80..FF and no code point above FF. Pass the string through utf8::downgrade($_); first.
  • Test 2
    • Same as Test 1, but pass the string through utf8::uprade($_); first.
  • Test 3
    • Output a string containing a code point above FF
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Correct. You need to pass the expect type of handle. Should be documented, but I realise it's not always. If it's not, the module probably can't handle non-ASCII chars anyway. – ikegami Jul 01 '13 at 16:05
  • In general, test. Test 1: Use a string containing a char in 80..FF and no character above FF. If you're testing an output, pass the string through `utf8::downgrade($_);` first. Test 2: Same as test1, but use `utf8::upgrade($_);`. Test 3: Use a string containing a char in above FF. – ikegami Jul 01 '13 at 16:08
  • @Nemo, what about Perl itself? `open`, `opendir`, still work with the concept of file names are made of bytes. – ikegami Jul 02 '13 at 20:56