PrintStream appears to be outputting incorrect characters for UTF-8 encoding

Question

I'm attempting to output a UTF-8 full block character (aka U+2588) with a PrintStream. The characters which are output are for the individual bytes which make up this code point. Thus I get e2 96 88 as â–ˆ.

I initialize the PrintStream object for UTF-8 and print a single full block character with the PrintStream.println(String) method:

PrintStream ps = new PrintStream(System.out,
                                 true,
                                 "UTF-8");
ps.println("\u2588");

(This prints the characters which I showed above.)

This page shows the full block character's Unicode offset, as well as the individual hexadecimal bytes that make up the character. Looking up each of these bytes (which I did here, here, and here) gives the characters which are shown above. Why is each of these bytes being interpreted as a separate character by the PrintStream?

EDIT: This was run against both JDK 1.8.0 update 131 & JDK 11.0.1 on Windows 10 Education 64 bit. I have tested this in Eclipse and through the Command Line. I did not use any compilation flags.

The output for that same code sample on my computer is what I have shown. The online compiler does, however, give the correct output. — Lane Surface, Feb 08 '19 at 05:43
This works for me as well. Could you update your question with full details of your environment (O/S, IDE, JDK, how you are running the code, etc...) where this is not working? — skomisa, Feb 08 '19 at 06:11
Your console probably has not been configured to show Unicode. — Sweeper, Feb 08 '19 at 06:42
As suggested by @Sweeper, I think your `Command Prompt` window cannot render the _full block_ character. This works fine me when writing to the **Output** window within NetBeans 10.0, but I can't get that _full block_ character to render properly in a `Command Prompt` or `PowerShell` window regardless of which font I select. That being the case, I think you might have more luck reframing your question (_"How to print full block char from Command Prompt window?"_), and asking it on a site such as [serverfault](https://serverfault.com/) or [superuser](https://superuser.com/) — skomisa, Feb 08 '19 at 07:18
[1] Actually, ignore my previous comment, since I can successfully paste and **echo** the full block character in a `Command Prompt` window, so it is not a font issue. [2] Also, I just tested in Eclipse and it works fine there - no idea why that is not working for you. — skomisa, Feb 08 '19 at 07:34
You need to set an appropriate font (e.g. _Consolas_ or _Lucida Console_), and change the code page ([chcp 65001](https://stackoverflow.com/a/22340018/2985643)), before running your app from the `Command Prompt` window. Then it works fine. — skomisa, Feb 08 '19 at 08:02
I had luck getting it to work in Eclipse after changing `Run Configurations > Common > Encoding` and setting the option to UTF-8. — Lane Surface, Feb 08 '19 at 17:46
The UTF-8 encoded form of `U+2588` is the byte sequence `E2 96 88`, so the `PrintStream` is producing the correct bytes (you would see that if you write out the bytes to a binary file). `â–ˆ` is simply what you get when those UTF-8 bytes are *subsequently* processed as Windows-1252/Latin-1 instead of as UTF-8. That is not the fault of `PrintStream` itself, but of your terminal window when `System.out` writes out the UTF-8 bytes. — Remy Lebeau, Feb 12 '19 at 23:41

PrintStream appears to be outputting incorrect characters for UTF-8 encoding

0 Answers0