9

It is possible to write Perl documentation in UTF-8. To do it you should write in your POD:

=encoding NNN

But what should you write instead NNN? Different sources gives different answers.

What is the correct answer? What is the correct string to be written in POD?

Community
  • 1
  • 1
bessarabov
  • 11,151
  • 10
  • 34
  • 59

2 Answers2

16
=encoding UTF-8

According to IANA, charset names are case-insensitive, so utf-8 is the same.

utf8 is Perl's lax variant of UTF-8. However, for safety, you want to be strict to your POD processors.

daxim
  • 39,270
  • 4
  • 65
  • 132
  • Thank you. This is the answer I wanted to get =) One more thing. So [perlpod](http://perldoc.perl.org/perlpod.html) with its `=encoding utf8` is incoreect. Do you think is is worth proposing a patch? – bessarabov Aug 07 '13 at 17:06
  • 2
    It's not a big thing. Do what you want. – daxim Aug 07 '13 at 17:08
4

As daxim points out, I have been misled. =encoding=UTF-8 and =encoding=utf-8 apply the strict encoding, and =encoding=utf8 is the lenient encoding:

$ cat enc-test.pod
=encoding ENCNAME

=head1 TEST '\344\273\245\376\202\200\200\200\200\200'

=cut

(here \xxx means the literal byte with value xxx. \344\273\245 is a valid UTF-8 sequence, \376\202\200\200\200\200\200 is not)

=encoding=utf-8:

$ perl -pe 's/ENCNAME/utf-8/' enc-test.pod | pod2cpanhtml | grep /h1
>TEST &#39;&#20197;&#27492;&#65533;&#39;</a></h1>

=encoding=utf8:

$ perl -pe 's/ENCNAME/utf8/' enc-test.pod | pod2cpanhtml | grep /h1
Code point 0x80000000 is not Unicode, no properties match it; ...
Code point 0x80000000 is not Unicode, no properties match it; ...
Code point 0x80000000 is not Unicode, no properties match it; ...
>TEST &#39;&#20197;&#2147483648;&#39;</a></h1>

They are all equivalent. The argument to =encoding is expected to be a name recognized by the Encode::Supported module. When you drill down into that document, you see

  • the canonical encoding name is utf8
  • the name UTF-8 is an alias for utf8, and
  • names are case insensitive, so utf-8 is equivalent to UTF-8

What's the best practice? I'm not sure. I don't think you go wrong using the official IANA name (as per daxim's answer), but you can't go wrong following the official Perl documentation, either.

mob
  • 117,087
  • 18
  • 149
  • 283
  • That alias part of the documentation mislead you, hyphen and no hyphen are treated differently. Try: `perl -MEncode=decode -MDevel::Peek=Dump -e'Dump decode "utf-8", "\xfe\x82\x80\x80\x80\x80\x80", Encode::FB_CROAK | Encode::LEAVE_SRC'` – daxim Aug 07 '13 at 17:19
  • Wow! Thank you for doing such great work for showing the difference between utf8 and utf-8. – bessarabov Aug 08 '13 at 04:26