Q: Why is Perl doing this? Can I override it?
It's not being escaped. That's a symptom of a characterset translation issue. The question mark character is a default character used when a code point doesn't map to any other character in the target characterset.
The short answer, as to why Perl is doing this may be: by default, Perl outputs to STDOUT using ascii characterset. Since ASCII only supports code points up to U+00EF, all other code points (for example, characters 128 thru 255) get translated to a question mark character.
The short answer as to how to override this behavior may be: specify that STDIN, STDOUT and STDERR use utf8 encoding rather than ascii by including a line like this in your perl program:
use open qw(:std :utf8);
Another potential issue is the setting of the MySQL session character_set_client
variable; the database connection may be using a latin1
characterset, but the database/server/column characterset may be utf8
, so a characterset translation may also be occurring there.
And it's possible to specify the characterset to be used in the database connection, to avoid an unwanted characterset translation.
As a starting point of understanding charactersets, here's two references you should have under your belt:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text