I am fighting for more than 1 day and Google a lot of requests to fix this problem without any result. :(
Actually I have the following code which read a text file UTF8 encoded with a list of names and my perl script should stop when it finds a specific name. Those names are given in French and have often some accents. That is when it starts behaving unexpectedly:
So here is the code:
#!/usr/bin/perl
$ErrorWordFile = "./myFile.txt";
open FILEcorpus, $ErrorWordFile or die $!;
while (<FILEcorpus>)
{
chomp;
$_=~ s/\r|\n//g;
$normWord=$_;
$string="stéphane";
if( $normWord eq $string )
{
print"\nYES!! does work";
}
else
{
print"\nNO does NOT work";
}
}
close(FILEcorpus)
Actually the corpus file (./myFile.txt) contains "stéphane\n" as the only characters.
It obviously comes from the UTF8 encoding of the file and the accents but apparently it is not that easy. I tried a looot of things including
use uft8
or
utf8::decode($normWord); without results
withou any success :(
any idea???
Many thanks for your precious help!
Simon