perl prints 3 wrong characters instead of unicode character

Question

Been having trouble with the print function, I know I'm missing something small. I've been looking everywhere and trying stuff out but can't seem to find the solution.

I'm trying to print braille characters in perl, I got the value of 2881 from a table and converted it to hexa. When I try to print the hexadecimal character, perl prints 3 characters instead.

Code:

#!/usr/local/bin/perl
  use utf8;
  print "\x{AF1}";

Output:

C:\Users\ElizabethTosh\Desktop>perl testff.pl
Wide character in print at testff.pl line 3.
α½▒

Have a look at https://stackoverflow.com/questions/627661/how-can-i-output-utf-8-from-perl — mttrb, Jun 27 '17 at 03:44
That particular fix, using binmode(STDOUT, ":utf8"); to format output, suppresses the warning, but still prints out 3 instead of 1. Do you think it could be my version of perl? I can't seem to find this issue elsewhere. — Liz, Jun 27 '17 at 03:50

ikegami · Answer 1 · 2017-06-27T05:37:57.623

3

Issue #1: You need to tell Perl to encode the output for your terminal.

Add the following to your program.

use Win32 qw( );
use open ':std', ':encoding(cp'.Win32::GetConsoleOutputCP().')';

use utf8; merely specifies the that source file is encoded using UTF-8 instead of ASCII.

Issue #2: Your terminal probably can't handle that character.

The console of US-English machines likely expect cp437. It's character set doesn't include any braille characters.

You could try switching to code page 65001 (UTF-8) using chcp 65001. You may also need to switch the console's font to one that includes braille characters. (MS Gothic worked for me, although it does weird things to the backslashes.)

Issue #3: You have the wrong character code.

U+0AF1 GUJARATI RUPEE SIGN (૱): "\x{AF1}" or "\N{U+0AF1}" or chr(2801)
U+0B41 ORIYA VOWEL SIGN U (ୁ): "\x{B41}" or "\N{U+0B41}" or chr(2881)
U+2801 BRAILLE PATTERN DOTS-1 (⠁): "\x{2801}" or "\N{U+2801}" or chr(10241)
U+2881 BRAILLE PATTERN DOTS-18 (⢁): "\x{2881}" or "\N{U+2881}" or chr(10369)

All together,

use strict;
use warnings;
use feature qw( say );

use Win32 qw( );
use open ':std', ':encoding(cp'.Win32::GetConsoleOutputCP().')';

say(chr($_)) for 0x2801, 0x2881;

Output:

>chcp 65001
Active code page: 65001

>perl a.pl
⠁
⢁

edited Jun 27 '17 at 05:37

answered Jun 27 '17 at 05:31

ikegami

367,544
15
269
518

unfortunately none of the above are working, the solutions make sense to me but I think I may have a hardware issue. – Liz Jun 27 '17 at 05:48
1

Not working is not an adequate description of the problem. What did you get? What OS? What's the "This is" line of `perl -v`? What font did you switch your console to use? – ikegami Jun 27 '17 at 05:49
I'm using windows 10, and perl is saying that the &Win::GetConsoleOutputCP subroutine is undefined; I learned about what GetConsoleOutputCP is, a function the kernel uses that returns code pages, and that the open keyword in perl opens files. I'm not sure how to proceed since what I've read suggests that open is for opening files. – Liz Jun 29 '17 at 04:11
You miscopied my code if that's the error message you got – ikegami Jun 29 '17 at 04:13
Yes, I am sure, and you have just confirmed it: The message in that linked image is different than one you posted earlier (`Win32` vs `Win`). – ikegami Jun 29 '17 at 04:22
What is the output of `perl -MWin32 -le"print $Win32::VERSION"`? You need 0.45+ (released 5 years ago) – ikegami Jun 29 '17 at 04:24
My output is ".44". Currently trying to figure out the fix, I just installed perl with the padre ide so I'm not sure why its already outdated, did I confuse the perl version with kernel properties? – Liz Jun 29 '17 at 04:33
Just upgrade it. `cpan Win32`. Alternatively, use the number output by `chcp`, which should be `65001` if you've switched the code page as I instructed. (Note that `chcp` only affects the console in which is used.) – ikegami Jun 29 '17 at 04:36

weibeld · Answer 2 · 2017-06-27T07:51:01.543

0

If you save a character with UTF-8, and it's displayed as 3 strange characters instead of 1, it means that the character is in the range U+0800 to U+FFFF, and that you decode it with some single-byte encoding instead of UTF-8.

So, change the encoding of your terminal to UTF-8. If you can't do this, redirect the output to a file:

perl testff.pl >file

And open the file with a text editor that supports UTF-8, to see if the character is displayed correctly.

You want to print the character U+2881 (⢁), and not U+0AF1. 2881 is already in hexadecimal.

To get rid of the Wide character in print warning, set the input and output of your Perl program to UTF-8:

use open ':std', ':encoding(UTF-8)';

Instead of use utf8;, which only enables the interpretation of the program text as UTF-8.

Summary

Source file (testff.pl):

#!/usr/local/bin/perl
use strict;
use warnings;
use open ':std', ':encoding(UTF-8)';
print "\x{2881}";

Run:

> perl testff.pl
⢁

edited Jun 27 '17 at 07:51

answered Jun 27 '17 at 04:26

weibeld

13,643
2
36
50

hmmm doesn't yield the same output for me... there must be an issue with my version of perl or the os, thanks for the response. – Liz Jun 27 '17 at 04:35
So what is your output now? – weibeld Jun 27 '17 at 04:37
As mentioned, check the character encoding of your terminal, and set it to UTF-8. Then it should work. – weibeld Jun 27 '17 at 04:55
1

I would recommend `use open ':std', ':encoding(UTF-8)';` instead of `binmode(STDOUT, ":utf8");`. It also adjusts STDIN and STDERR, and it sets the default for file handles opened in scope. – ikegami Jun 27 '17 at 05:16

perl prints 3 wrong characters instead of unicode character

2 Answers2