I have a CSV file, say win.csv, whose text is encoded in windows-1252. First I use iconv to make it in utf8.
$iconv -o test.csv -f windows-1252 -t utf-8 win.csv
Then I read the converted CSV file with the following Perl script (utfcsv.pl).
#!/usr/bin/perl
use utf8;
use Text::CSV;
use Encode::Detect::Detector;
my $csv = Text::CSV->new({ binary => 1, sep_char => ';',});
open my $fh, "<encoding(utf8)", "test.csv";
while (my $row = $csv->getline($fh)) {
my $line = join " ", @$row;
my $enc = Encode::Detect::Detector::detect($line);
print "($enc) $line\n";
}
$csv->eof || $csv->error_diag();
close $fh;
$csv->eol("\r\n");
exit;
Then the output is like the following.
(UFT-8) .........
() .....
Namely the encoding of all lines are detected as UTF-8 (or ASCII). But the actual output does not seem to be UTF-8. In fact, if I save the output on a file
$./utfcsv.pl > output.txt
then the encoding of output.txt is detected as windows-1252.
Question: How can I get the output text in UFT-8?
Notes:
- Environment: openSUSE 13.2 x86_64, perl 5.20.1
- I do not use Text::CSV::Encoded because the installation fails. (Because test.csv is converted in UTF-8, so it is strange to use Text::CSV::Encoded.)
- I use the following script to check the encoding. (I also use it to find out the encoding of the initial CSV file win.csv.)
.
#!/usr/bin/perl
use Encode::Detect::Detector;
open my $in, "<","$ARGV[0]" || die "open failed";
while (my $line = <$in>) {
my $enc = Encode::Detect::Detector::detect($line);
chomp $enc;
if ($enc) {
print "$enc\n";
}
}