#!/usr/bin/perl
use warnings;
use strict;
use Encode;
binmode STDOUT, ":encoding(utf8)";
open F1, "<:utf8", "$ARGV[0]" or die "$!";
open F2, "<", "$ARGV[0]" or die "$!";
my $a1 = <F1>;
chomp $a1;
my $a2 = <F2>;
chomp $a2;
if ($a1 eq $a2) {
print "$a1=$a2 is true\n";
} else {
print "$a1=$a2 is false\n";
}
my $b = decode("utf-8", $a2);
if ($a1 eq $b) {
print "$a1=$b is true\n";
} else {
print "$a1=$b is false\n";
}
I wrote a test program listed above. And create a text file with one line: 基地局.
When you run the program with this text file, you can get a false and a true.
I don't know what's in your program, but I guess the csv file is read as a plain text without any parsers or encode/decode procedures, whereas the xml file must be parsed by some library, so that the internal encoding mechanism is different for the two string variables, including some leading bytes of encoding notation.
Simply put, you can try to encode or decode one of the two string variables, and see if they match.
By the way, this is my first answer here, hope it can be a little bit helpful to you ;-)
From your dump results, it's obvious. The first variable stores 9 characters which constrcut 基地局 in utf-8 encoding in its internal structure. The second variable represents 3 characters in its internal structure. They have same byte stream, and are equal in a byte-stream view but not equal in a character-based comparison.
Use decode/encode can solve your problem.