What is the standard test in Perl to determine if a value is a sequence of bytes or an encoded string of characters? And if it's an encoded string, what character encoding is it in?
Let's assume the following complete Perl script:
'foo';
How would one determine if this literal string is a sequence of bytes or a string of characters in some encoding? And if it's a string of characters in some character encoding, what character encoding is it in?
This question is not about Unicode or UTF-8. It's about bytes versus characters in Perl generally. This question is also not about automated character encoding detection, which is a different topic entirely.
UPDATE
After initializing $letter
, I want Perl to tell me what character encoding it thinks the letter stored in the variable $letter
is in. I don't expect it necessarily to be right. Ensuring that Perl's understanding of what character encoding the letter is in is my responsibility as the programmer. I get that. But there should be a simple, easy way to test what character encoding Perl thinks a character (or string of characters) is in. Isn't there?
C:\>perl -E "$letter = 'Ž'; say $letter =~ m/\w/ ? 'matches' : 'does not match'"
does not match
C:\>perl -MEncode -E "$letter = decode('UTF-8', 'Ž'); say $letter =~ m/\w/ ? 'matches' : 'does not match'"
does not match
C:\>perl -MEncode -E "$letter = decode('Windows-1252', 'Ž'); say $letter =~ m/\w/ ? 'matches' : 'does not match'"
matches
C:\>perl -MEncode -E "$letter = decode('Windows-1252', 'Ž'); $letter = encode('Windows-1252', $letter); say $letter =~ m/\w/ ? 'matches' : 'does not match'"
does not match
C:\>chcp
Active code page: 1252
C:\>
Can't Perl report on demand what character encoding it understands (rightly or wrongly) the value stored in $letter
is in?