I have been busting my noggin trying to figure out how to handle some special common characters that are input by users in forms. Examples of what I mean are the copyright sign, registered sign, fraction 1/2, fraction 1/4, etc. So here is what happens:
Users enter these characters, and they are saved into a regular text file. No problem. They are saved in their true and pure form. Now when we grab them with the Perl CGI file, and they are displayed in a browser, I get all these "A"s and other A-characters with markings above them. I am running a subroutine on the string to try to convert these from Unicode matches into HTML entities, but it doesn't seem to be working.
Perl Code:
#string with special characters
$special_chars=encodebc($special_chars);
sub encodebc{
$answer=$_[0];
$answer =~ s/:://gi;
$answer =~ s/\x{0022}/"/g;
$answer =~ s/\x{0027}/'/g;
$answer =~ s/\x{0026}/&/g;
$answer =~ s/\x{003C}/</g;
$answer =~ s/\x{003E}/>/g;
$answer =~ s/\x{0060}/`/g;
$answer =~ s/\x{007B}/{/g;
$answer =~ s/\x{007D}/}/g;
$answer =~ s/\x{00A9}/©/g;
$answer =~ s/\x{00AE}/®/g;
$answer =~ s/\x{00AB}/«/g;
$answer =~ s/\x{00BB}/»/g;
$answer =~ s/\x{00A2}/¢/g;
$answer =~ s/\x{00B0}/°/g;
$answer =~ s/\x{00B2}/²/g;
$answer =~ s/\x{00B3}/³/g;
$answer =~ s/\x{00B5}/µ/g;
$answer =~ s/\x{00BC}/¼/g;
$answer =~ s/\x{00BD}/½/g;
$answer =~ s/\x{00BE}/¾/g;
$answer =~ s/\x{00E1}/á/g;
$answer =~ s/\x{00E9}/é/g;
$answer =~ s/\x{00F1}/ñ/g;
$answer =~ s/\x{00F5}/õ/g;
$answer =~ s/\x{00F8}/ø/g;
return $answer;
}
In the above code, I'm matching for two-byte characters in Unicode...so I'm not understanding where the "A" characters are coming from.
Before you downvote me, please know I have spent hours upon hours working on this and reading trying to figure this out. I appreciate anyone who can help me out here.