9

tldr: I downgraded to JDK 17 (17.0.2) and now it works...

I was watching a beginners Java tutorial by Kody Simpson on YT (youtube.com/watch?v=t9LP9Nt9Nco), and in that tutorial the boy Kody prints crazy symbols called Unicode like "☯Ωøᚙ", but for me it just prints "?" - a question mark.

char letter = '\u1699';
System.out.println(letter);

I tried pretty much every solution on Stack Overflow, such as:

  • Changing File Encoding to UTF-8, although mine was using UTF-8 by default.
  • Putting '-Dconsole.encoding=UTF-8' and '-Dfile.encoding=UTF-8' in the Edit Custom VM options.
  • Messing with Region Settings in control panel.

None of it worked.

Every post was also from many years ago, such as this one, which is from 12 years:

unicode characters appear as question marks in IntelliJ IDEA console

I ended up deleting and re-downloading Intellij because I thought I messed up some settings and wanted a restart, but this time I made the Project SDK an older version, Oracle openJDK version 14.0.1, and now somehow it worked and printed the 'ᚙ' symbol.

Then I realized the problem might be the latest version of the JDK which is version 18, so I downloaded JDK 17.0.2, and it BOOM it still works and prints out the symbol 'ᚙ', so thats nice :). But when I switched back to JDK version 18 it just prints "?" again.

Also its strange because I can copy paste the ᚙ symbol into the writing code area whatever you call it, (on JDK version 18)

char letter = 'ᚙ';
System.out.println(letter);

But when I press RUN and try to PRINT ... it STILL GIVES QUESTION MARK.

I have no clue why this happens, I started learning coding 2 days so I'm probably dumb, or the new version has got a bug, but I never found a solution through Google or here, so this is why I'm making my first ever Stack Overflow post.

skomisa
  • 16,436
  • 7
  • 61
  • 102
Xin69
  • 91
  • 3
  • 2
    which platform and which terminal are you using – phuclv Mar 31 '22 at 00:34
  • It prints a literal "?" or "�"/"□"? The font being used in the terminal may not support the character you are printing. – Locke Mar 31 '22 at 00:40
  • I copy-pasted your snippet `char letter = '\u1699'; System.out.println( letter );` into IntelliJ 2022.1 Beta (Ultimate Edition) and ran successfully, displaying `ᚙ`, the OGHAM LETTER EAMHANCHOLL character. I wish I knew how to tell the character encoding currently in use in IntelliJ's `Run` console. – Basil Bourque Mar 31 '22 at 00:42
  • 1
    @BasilBourque How about navigating to **File > Settings... > Editor > General > Console** and viewing the value of **Default Encoding**? But even if that gives you what you are looking for, and I'm not sure it does, it would be more convenient if you could get/set the console's encoding from the Status bar. – skomisa Mar 31 '22 at 01:42
  • The program is correct. But either you do not have fonts which support such symbols (you may need to install many "Noto fonts" from Google), or you terminal settings are incorrect: be sure that there are fallback fonts available, no single font can support so many characters) – Giacomo Catenazzi Mar 31 '22 at 07:23
  • 1
    @GiacomoCatenazzi If the console font simply didn't contain the character 'ᚙ' wouldn't it be rendered as a replacement character ('�') rather than a question mark ('?'), which is what the OP is seeing? It seems more likely that this is a console settings issue (encoding?) rather than console font issue, so `println(letter)` is trying to render something invalid/meaningless. Yet none of that really explains why the OP can resolve the problem simply by using JDK 17 instead of JDK 18. – skomisa Apr 01 '22 at 00:10
  • 3
    To those who are voting to close this question because it is "Not reproducible or was caused by a typo", you are wrong. It is definitely reproducible and definitely not caused by a typo. – skomisa Apr 01 '22 at 06:04
  • It is possible. For this reason I recommend as first step to debug: *print to a file*, which reduce a lot of potential problems. Nobody should mangle your files, so if the problem is in the program. For replacement character: often yes, but it is up to the program to choose how to handle. And terminals... are complex for fonts: someone recommend standard text libraries, some people recommend just to use shaping, other direct font rendering, so do not expect "standard text behaviour" in terminals. – Giacomo Catenazzi Apr 01 '22 at 06:58
  • 1
    [1] Please don't embed your solution within your question. Instead, create an answer to your own question. That is more helpful to the SO community. [2] While your approach of regressing to JDK 17 certainly resolves the issue, it is not a solution to the problem; it is just a workaround which avoids addressing it. A proper fix can be implemented with a simple change to your code on JDK 18. – skomisa Apr 01 '22 at 20:01

4 Answers4

13

I can replicate your problem: printing works correctly when running your code if compiled with JDK 17, and fails when running your code if compiled with JDK 18.

One of the changes implemented in Java 18 was JEP 400: UTF-8 by Default. The summary for that JEP stated:

Specify UTF-8 as the default charset of the standard Java APIs. With this change, APIs that depend upon the default charset will behave consistently across all implementations, operating systems, locales, and configurations.

That sounds good, except one of the goals of that change was (with my emphasis added):

Standardize on UTF-8 throughout the standard Java APIs, except for console I/O.

So I think your problem arose because you had ensured that the console's encoding in Intellij IDEA was UTF-8, but the PrintStream that you were using to write to that console (i.e. System.out) was not.

The Javadoc for PrintStream states (with my emphasis added):

All characters printed by a PrintStream are converted into bytes using the given encoding or charset, or the default charset if not specified.

Since your PrintStream was System.out, you had not specified any "encoding or charset", and were therefore using the "default charset", which was presumably not UTF-8. So to get your code to work on Java 18, you just need to ensure that your PrintStream is encoding with UTF-8. Here's some sample code to show the problem and the solution:

package pkg;

import java.io.FileDescriptor;
import java.io.FileOutputStream;
import java.io.PrintStream;
import java.nio.charset.StandardCharsets;

public class Humpty {

    public static void main(String[] args) throws java.io.UnsupportedEncodingException {

        char letter = 'ᚙ';
        String charset1 = System.out.charset().displayName();  // charset() requires JDK 18

        System.out.println("Writing the character " + letter + " to a PrintStream with charset " + charset1); // fails

        PrintStream ps = new PrintStream(new FileOutputStream(FileDescriptor.out), true, StandardCharsets.UTF_8);
        String charset2 = ps.charset().displayName(); // charset() requires JDK 18
        ps.println("Writing the character " + letter + " to a PrintStream with charset " + charset2); // works
    }
}

This is the output in the console when running that code:

C:\Java\jdk-18\bin\java.exe -javaagent:C:\Users\johndoe\AppData\Local\JetBrains\Toolbox\apps\IDEA-U\ch-0\221.5080.93\lib\idea_rt.jar=64750:C:\Users\johndoe\AppData\Local\JetBrains\Toolbox\apps\IDEA-U\ch-0\221.5080.93\bin -Dfile.encoding=UTF-8 -classpath C:\Users\johndoe\IdeaProjects\HelloIntellij\out\production\HelloIntellij pkg.Humpty
Writing the character ? to a PrintStream with charset windows-1252
Writing the character ᚙ to a PrintStream with charset UTF-8

Process finished with exit code 0

Notes:

  • PrintStream has a new method in Java 18 named charset() which "returns the charset used in this PrintStream instance". The code above calls charset(), and shows that for my machine my "default charset" is windows-1252, not UTF-8.
  • I used Intellij IDEA 2022.1 Beta (Ultimate Edition) for testing.
  • In the console I used font DejaVu Sans to ensure that the character "ᚙ" could be rendered.

UPDATE: To address the issue raised in the comments below by Mostafa Zeinali, the PrintStream used by System.out can be redirected to a UTF-8 PrintStream by calling System.setOut(). Here's sample code:

    String charsetOut = System.out.charset().displayName();
    if (!"UTF-8".equals(charsetOut)) {
        System.out.println("The charset for System.out is " + charsetOut + ". Changing System.out to use charset UTF-8");
        System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, StandardCharsets.UTF_8));
        System.out.println("The charset for System.out is now " +    System.out.charset().displayName());
    }

This is the output from that code on my Windows 10 machine:

The charset for System.out is windows-1252. Changing System.out to use charset UTF-8
The charset for System.out is now UTF-8

Note that System.out is a final variable, so you can't directly assign a new PrintStream to it. This code fails to compile with the error "Cannot assign a value to final variable 'out'":

System.out = new PrintStream(new FileOutputStream(FileDescriptor.out), true, StandardCharsets.UTF_8); // Won't compile
skomisa
  • 16,436
  • 7
  • 61
  • 102
  • 1
    Yeah, windows is the only os which doesn't have UTF-8 by default. The good news: you can set it as default. And Powershell has own additional "problems" (very different design, also with encodings (they are opaque), so do not consider it as normal terminal, but as it own environment). – Giacomo Catenazzi Apr 01 '22 at 07:03
  • Question, can this be solved without downgrading to java 17? If it can, please, I'd like to know how! – Mostafa Zeinali Apr 18 '22 at 07:47
  • BTW, setting -Dfile.encoding="UTF-8" does NOT work! – Mostafa Zeinali Apr 18 '22 at 07:53
  • 1
    @MostafaZeinali I do not understand why you are posting those comments here. My answer specifically details how to resolve the issue with sample code, without downgrading to Java 17, so why are you asking whether it is possible? Does that solution not work for you? Also, my answer does not even mention **-Dfile.encoding=UTF-8** so why bring it up here? – skomisa Apr 18 '22 at 07:57
  • You do mention that making sure the PrintStream is UTF-8 solves the problem. But I cannot find an explanation on how to make System.out to use UTF-8 on java 18. You're wrapping FileDescriptor.out in another stream. But whenever and wherever in this application, anyone uses system.out, they will be using a stream with non-utf8 charset. Please see my answer. – Mostafa Zeinali Apr 18 '22 at 08:02
  • 2
    @MostafaZeinali See the Javadoc for the method `System.setOut()` which reassigns the "standard" output stream. For example, you could do this: `System.setOut(new PrintStream(new FileOutputStream( FileDescriptor.out), true, StandardCharsets.UTF_8));` to _"make System.out to use UTF-8 on java 18"_. Does that resolve your concern? If not, consider posting a new question. Comments are not the place for extended discussion. – skomisa Apr 18 '22 at 08:14
  • 2
    @MostafaZeinali I have updated my answer to address the issue you raised on how to get `System.out` to use UTF-8. – skomisa Apr 18 '22 at 22:23
4

TLDR: Use this on Java 18:

-Dfile.encoding="UTF-8" -Dsun.stdout.encoding="UTF-8" -Dsun.stderr.encoding="UTF-8"

From JEP 400:

There are three charset-related system properties used internally by the JDK. They remain unspecified and unsupported, but are documented here for completeness: sun.stdout.encoding and sun.stderr.encoding — the names of the charsets used for the standard output stream (System.out) and standard error stream (System.err), and in the java.io.Console API. sun.jnu.encoding — the name of the charset used by the implementation of java.nio.file when encoding or decoding filename paths, as opposed to file contents. On macOS its value is "UTF-8"; on other platforms it is typically the default charset.

As you can see, those two system properties "remain unspecified and unsupported". But they solved my problem. Therefore, please use them at your own risk, and DO NOT use them in production env. I'm running Eclipse on Windows 10 btw.

I think there must be a good way to set the default charset of JVM upon running, and it is stupid that passing -Dfile.encoding="UTF-8" does not do that. As you can read in JEP 400:

If file.encoding is set to "UTF-8" (i.e., java -Dfile.encoding=UTF-8), then the default charset will be UTF-8. This no-op value is defined in order to preserve the behavior of existing command lines.

And this is exactly what it is "NOT" doing. Passing Dfile.encoding="UTF-8" does "not" preserve the behavior of existing command lines! I think this shows that Java 18's implementation of JEP 400 is not doing what it should actually be doing, which is the root of your problem in the first place.

Mostafa Zeinali
  • 2,456
  • 2
  • 15
  • 23
  • 1
    [1] You are selectively quoting from JEP 400, and misrepresenting it. One of its stated goals is "Standardize on UTF-8 throughout the standard Java APIs, **except for console I/O**" [emphasis mine]. So JEP400 is explicitly not claiming to resolve any issues with UTF-8 for console I/O. [2] [Java bug 4163515](https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4163515) from 1998 (!!!) states: _the "file.encoding" property is not required by the J2SE platform specification... and should not be examined or modified by user code_. So using **-Dfile.encoding=UTF-8** has _never_ been supported. – skomisa Apr 18 '22 at 08:42
  • Hmmm.. Interesting. I think you're right on this one. But there's an important question here: Is there really no way to tell JVM to create System.out and System.err with UTF-8? That just seems so odd to me! And JEP 400 want's to "Standardize", "except for console I/O". I'm well aware of that. BUT, it also aims to "preserve the behavior of existing command lines." which it simply does not! Now, this change of behavior could be the fault of IDEs like Eclipse and IntelliJ. I think this is something worth looking into, but I'm gonna leave that to others since my problem is solved! – Mostafa Zeinali Apr 19 '22 at 04:49
  • 2
    Actually there is a way, but I'm guessing that you won't like it much. From that bug report from 1998 which I cited above: _"The preferred way to change the default encoding used by the VM and the runtime system is to change the locale of the underlying platform before starting your Java program"_. It's astonishing that nearly 24 years later this apparently remains the "official" solution. Console I/O with Java is a bit of a mess. – skomisa Apr 19 '22 at 07:08
  • Wow! Yeah, to tell you the truth, I did search "how to change Windows default charset to UTF-8". What I found left much to be desired!!! – Mostafa Zeinali Apr 20 '22 at 05:46
  • 1
    This should make you smile: the document you linked to, JEP400, has recently been updated to state: **"Prior to deploying on a JDK where UTF-8 is the default charset, developers are strongly encouraged to check for charset issues by starting the Java runtime with java -Dfile.encoding=UTF-8 ... on their current JDK (8-17)"**. So they are now actively urging developers to use the same _"file.encoding"_ setting that in the past _"should not be examined or modified by user code"_!!! – skomisa May 05 '22 at 01:47
  • 1
    IntelliJ started using this as well. I'm on Idea 2022.2.4 something and this is the run arguments it uses for a simple main: "C:\Program Files\Java\jdk-19\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2022.2.3\lib\idea_rt.jar=28562:C:\Program Files\JetBrains\IntelliJ IDEA 2022.2.3\bin" -Dfile.encoding=UTF-8 -Dsun.stdout.encoding=UTF-8 -Dsun.stderr.encoding=UTF-8 – Mostafa Zeinali Nov 27 '22 at 00:22
  • 1
    Interesting. I've just learned that two new system properties were added in Java 19: **stdout.encoding** and **stderr.encoding**. See [JDK-8285492 Release Note: New System Properties for `System.out` and `System.err`](https://bugs.openjdk.org/browse/JDK-8285492): _"The properties can be overridden on the launcher's command line option (with `-D`) to set them to `UTF-8` where required."_ Those new properties are formally defined in the JDK19 Javadoc for [System.getProperties()](https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/lang/System.html#getProperties()). – skomisa Nov 28 '22 at 02:49
  • 1
    So perhaps Intellij IDEA (for JDK19+) should use those properties as arguments rather than **sun.stdout.encoding** and **sun.stderr.encoding**, which have never been supported anyway. Try it yourself if you can, because that may be the better approach for any JDK >= 19. However, note that although user assignment of those two properties is supported, the only permitted value is _"UTF-8"_! – skomisa Nov 28 '22 at 02:52
  • 1
    Yeah, I think as of JDK 19, we have an official fix for UTF-8 support in Eclipse and IntelliJ consoles. Interesting that the only value permitted is UTF-8. I think it is only for platforms that do not have UTF-8 as the default charset, which is "ONLY WINDOWS"!! I mean, come on windows! – Mostafa Zeinali Nov 29 '22 at 03:13
  • 1
    Interestingly enough, in some localities, Windows has UTF-16 as default charset! But the result of `[System.Text.Encoding]::Default` on my system is `Windows-1252` – Mostafa Zeinali Nov 29 '22 at 03:21
0

Had such trouble as well. Changing setting (File > Settings... > Editor > General > Console) into UTF-32 helped to solve this issue.

0

Update IntelliJ IDEA to version 2022.2.1+. Very similar problem was classified as a bug. You can find more details here.