14

I'm trying to create/save HTML files in Perl in UTF-8, but nothing I have done so far works. A previous answer here on SO said to use binmode, so I tried that. Here is my code:

open (OUT, ">$sectionfilename");
binmode(OUT, ":utf8");
print OUT $section;
close OUT;

When I open these files in a text editor like Notepad they are still in ANSI encoding. What am I doing wrong?

Community
  • 1
  • 1
Joshua
  • 215
  • 1
  • 4
  • 9

2 Answers2

19

A text editor is a poor tool to examine low-level things such as encodings. Use a hexviewer/hexdumper instead. The modern way to write your example:

use autodie qw(:all);
open my $out, '>:encoding(UTF-8)', $sectionfilename;
print {$out} $section;
close $out;

autodie enables automatic error-checking.

daxim
  • 39,270
  • 4
  • 65
  • 132
5

Seems to work for me:

C:\Documents and Settings>cat a.pl
$sectionfilename = "a.txt";
$section = "Hello \x{263A}!\n";

open (OUT, ">$sectionfilename");
binmode(OUT, ":utf8");
print OUT $section;
close OUT;    

C:\Documents and Settings>perl a.pl

C:\Documents and Settings>file a.txt
a.txt: UTF-8 Unicode text, with CRLF line terminators

But when I change the text to be written to:

$section = "Hello";

and run:

C:\Documents and Settings>perl a.pl

C:\Documents and Settings>file a.txt
a.txt: ASCII text, with no line terminators
codaddict
  • 445,704
  • 82
  • 492
  • 529
  • That is because on simple text with characters < 128 there is no way to distinguish between ASCII and UTF-8 (bytes are the same). If you want `file` to show UTF-8 you need to force an UTF-8 BOM (Byte Order Mark) at file beginning between `open()` and `binmode()` with `print OUT "\x{ef}\x{bb}\x{bf}";` – Seki Jan 24 '23 at 13:07