2

I am writing a program to fix mangled encoding, specifically latin1(iso-8859-1) to greek (iso-8859-7).

I created a function that works as intended; a variable with badly encoded text is converted properly.

When I try to convert $ARGV[0] with this function it doesn't seem to correctly interpret the input.

Here is a test program to demonstrate the issue:

#!/usr/bin/env perl

use 5.018;
use utf8;
use strict;
use open qw(:std :encoding(utf-8));
use Encode qw(encode decode);

sub unmangle {
 my $input = shift;

 print $input . "\n";
 print decode('iso-8859-7', encode('latin1',$input)) . "\n";
}


my $test = "ÁöéÝñùìá";  # should be Αφιέρωμα

say "fix variable:";
unmangle($test);

say "\nfix argument:";
unmangle($ARGV[0]);

When I run this program with the same input as my $test variable the reults are not the same (as I expected that they should be):

$ ./fix_bad_encoding.pl "ÁöéÝñùìá"
fix variable:
ÁöéÝñùìá
Αφιέρωμα

fix stdin:
ÃöéÃñùìá
ΓΓΆΓ©ΓñùìÑ

How do I get $ARGV[0] to behave the way the $test variable does?

MERM
  • 629
  • 7
  • 21
  • Doing further research ( here: https://stackoverflow.com/questions/9730835/how-to-handle-utf8-on-the-command-line-using-perl-or-python and here: https://perldoc.perl.org/perlrun#%2a-C-%5b_number%2flist_%5d%2a ) showed me that by adding the `-CA` flag to perl I can get `$ARGV[0]` to behave as desired. Now all I have to do is figure out how to invoke this option from within my program instead of `perl -CA ./fix_bad_encoding.pl ÁöéÝñùìá ` – MERM Jan 10 '22 at 22:57
  • 1
    Tip: `utf8` (a non-standard encoding) should be `utf-8` (the standard encoding) – ikegami Jan 10 '22 at 23:40

2 Answers2

2

You decoded the source. You decoded STDIN (which you don't use), STDOUT and STDERR. But not @ARGV.

$_ = decode("UTF-8", $_) for @ARGV;
ikegami
  • 367,544
  • 15
  • 269
  • 518
1

-CA tells Perl the arguments are UTF-8 encoded. You can decode the argument from UTF-8 yourself:

unmangle(decode('UTF-8', $ARGV[0]));

Also, it's not "stdin" (that would be reading from *STDIN), but "argument".

choroba
  • 231,213
  • 25
  • 204
  • 289
  • Even more digging (here: https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default) showed me that while I can't invoke `perl -CA` via a pragma or other internal code, I can set the ENV variable `PERL_UNICODE='A'` which will make the program act as desired. – MERM Jan 10 '22 at 23:42