As daxim points out, I have been misled. =encoding=UTF-8
and =encoding=utf-8
apply the strict encoding, and =encoding=utf8
is the lenient encoding:
$ cat enc-test.pod
=encoding ENCNAME
=head1 TEST '\344\273\245\376\202\200\200\200\200\200'
=cut
(here \xxx
means the literal byte with value xxx
. \344\273\245
is a valid UTF-8 sequence, \376\202\200\200\200\200\200
is not)
=encoding=utf-8
:
$ perl -pe 's/ENCNAME/utf-8/' enc-test.pod | pod2cpanhtml | grep /h1
>TEST '以此�'</a></h1>
=encoding=utf8
:
$ perl -pe 's/ENCNAME/utf8/' enc-test.pod | pod2cpanhtml | grep /h1
Code point 0x80000000 is not Unicode, no properties match it; ...
Code point 0x80000000 is not Unicode, no properties match it; ...
Code point 0x80000000 is not Unicode, no properties match it; ...
>TEST '以�'</a></h1>
They are all equivalent. The argument to =encoding
is expected to be a name recognized by the Encode::Supported
module. When you drill down into that document, you see
- the canonical encoding name is
utf8
- the name
UTF-8
is an alias for utf8
, and
- names are case insensitive, so
utf-8
is equivalent to UTF-8
What's the best practice? I'm not sure. I don't think you go wrong using the official IANA name (as per daxim's answer), but you can't go wrong following the official Perl documentation, either.