4

Java isn't able to recognize Unicode characters with the Beta: Use Unicode UTF-8 for worldwide language support option enabled.
The path to my user folder is C:\Users\Otávio Augusto Silva, and the á character is causing some trouble for java. By calling the javac command if the JDK is installed inside my user folder using scoop install, it gives the following result:

Erro: Não é possível carregar a classe principal com.sun.tools.javac.Main no módulo jdk.compiler
        java.lang.UnsatisfiedLinkError: no jimage in system library path: C:\Users\Otávio Augusto Silva\scoop\apps\zulu-jdk\current\bin

Notice that it replaces the á character with á.
If installed globally by using scoop install -g, choco install or the default installer from any JDK distribution, the commands works fine, but if I call and pass the whole path, it gives an error:

C:\Users\Otávio Augusto Silva>javac "C:\Users\Otávio Augusto Silva\Documents\Code\Java\Hello World\main.java"
error: file not found: C:\Users\Otávio Augusto Silva\Documents\Code\Java\Hello World\main.java
Usage: javac <options> <source files>
use --help for a list of possible options

To reproduce, do the following:

  • Have a user folder with a Unicode latin character (something like á, é, ã, etc.)
  • Have the Beta: Use Unicode UTF-8 for worldwide language support in region settings enabled
  • Install your favorite JDK distribution
  • Call javac passing the whole path like C:\Users\USERFOLER\PATH\TO\FILE\file.java

The error should appear.
I've been stuck for days in this, if anyone can help me it will be greatly appreciated.
Some relevant info:

  • I'm using cmd in Windows Terminal app, but PowerShell gives the same error
  • The chcp command gives the code 65001
  • I already tried the solution presented here, didn't work
  • I suspect the chcp setting is the problem. javac doesn’t inherit that setting; it uses the system charset, which is likely the windows-1252 charset, so it assumes each byte of the file name is a character (because windows-1252 is a one-byte-per-character encoding). But the chcp setting causes the filename to be *sent* to javac as UTF-8 bytes, which means `á` is sent as two bytes. – VGR Sep 09 '22 at 17:37
  • 1
    works fine for me https://i.stack.imgur.com/lu0Mc.png – user16320675 Sep 09 '22 at 17:47
  • @VGR is there any way I can make this issue about the incompatibility between the system charset and what `chcp` is configured to do? – Otávio Augusto Silva Sep 09 '22 at 17:48
  • @user16320675 do you have the Windows option I mentioned before? – Otávio Augusto Silva Sep 09 '22 at 17:48
  • We've been through this and I thought we decided you could use DOS names? So that's ```C:\Users\OTVIOA~1``` We discussed that [here](https://chat.stackoverflow.com/transcript/message/55190875#55190875) So ```javac "C:\Users\OTVIOA~1\Documents\Code\Java\Hello World\main.java"``` – g00se Sep 09 '22 at 18:21
  • @g00se I deleted that question because it was very poorly written, but I am using your suggestion, I just want a more permanent solution like making java read the path properly – Otávio Augusto Silva Sep 09 '22 at 19:40
  • @user16320675 You're using English (US), I'm using Portuguese (Brazil), that may impact it somehow – Otávio Augusto Silva Sep 09 '22 at 19:40
  • @user16320675 I tested Zulu, Microsoft, Temurin by Adoptium, Oracle, OpenJDK and Liberica, all behave the same – Otávio Augusto Silva Sep 09 '22 at 21:00
  • `java -XshowSettings:properties -version 2>&1 | find "user"` could be informative – g00se Sep 09 '22 at 21:00
  • @g00se the result of your command: https://pastebin.com/6AD1j7xY – Otávio Augusto Silva Sep 11 '22 at 18:44
  • Yes, that's really confused the terminal in which you executed that. Was it cmd.exe or Powershell? – g00se Sep 11 '22 at 20:22
  • @g00se In CMD, it doesn't work on PowerShell, it says the `find` command has incorrect parameter format – Otávio Augusto Silva Sep 11 '22 at 21:02
  • Powershell might prefer ```java -XshowSettings:properties -version 2>&1 | grep "user"``` – g00se Sep 12 '22 at 14:18
  • @g00se It gives the exact same result – Otávio Augusto Silva Sep 12 '22 at 18:36
  • *it says the find command has incorrect parameter format* *It gives the exact same result* Both of those statements cannot be true simultaneously - in the second case, the ```find``` command is not even used – g00se Sep 12 '22 at 21:17
  • @g00se I mean that using `java -XshowSettings:properties -version 2>&1 | grep "user"` gives the result in the pastebin link I posted earlier – Otávio Augusto Silva Sep 12 '22 at 21:27
  • Run [this](https://technojeeves.com/tech/Raw.class) and let me know if the first line prints the second character of your name or if the last line prints it correctly – g00se Sep 12 '22 at 22:36
  • @g00se The output: https://pastebin.com/n6aXPwLM – Otávio Augusto Silva Sep 13 '22 at 00:08
  • OK, that's good. Of course, I meant the 3rd character ;) This has proved that the encoding in use in that terminal *is* UTF-8 (which I was beginning to doubt) – g00se Sep 13 '22 at 08:41
  • Does the following work? ```C:\Users\Otávio Augusto Silva>javac -encoding UTF-8 "C:\Users\Otávio Augusto Silva\Documents\Code\Java\Hello World\main.java"``` – g00se Sep 13 '22 at 08:46
  • @g00se Sadly, no, it gives the error `error: file not found: C:\Users\Otávio Augusto Silva\Documents\Code\Java\Hello World\main.java` – Otávio Augusto Silva Sep 13 '22 at 10:59

1 Answers1

2

Using your directory name (Otávio Augusto Silva), I can reproduce your problem on Windows 10 as well, using Java 18. Unfortunately, this looks like a specific example of a more general and longstanding problem documented in this open and unresolved JDK bug:

JDK-4488646 Java executable and System properties need to support Unicode on Windows

This is part of the bug report's description, with my emphasis added:

To make Java completely Unicode-aware on NT we need to

  1. Modify System properties initialization code and all other places where Windows calls are used to use wide-char calls on NT.

  2. Modify java, javac etc. to be able to use Unicode in classpath and other command line arguments.

That bug report was created in 2001! It relates to Windows NT, but since it remains open and unresolved I assume it has general applicability for all flavors of Windows, including Windows 10 and 11.

Notes:

  • Although it doesn't help to resolve your specific problem, it is fairly straightforward "to use wide-char calls" within your Java application (as mentioned in the bug description above) using JNA. For example, your code could successfully process Otávio Augusto Silva if it was passed an argument to your application from Java. See this SO answer for the code to do that.

  • Also see open and unresolved JDK bug report JDK-8124977 cmdline encoding challenges on Windows which was raised in 2015. It includes some discussion on the differences between using java from cmd and PowerShell windows on Windows.

========================================================

(This update is based on comments from @user16320675.)

It seems the issue is fully resolved in Java 19 (download from here) which is due to be released later this month. From the screen shot below:

  • The call to javac will succeed when using JDK 19.

  • The same call to javac will fail when using JDK 18, because the file name D:\Otávio... is processed as D:\Otávio....

    javac calls

I can't find any mention of this fix in the JDK 19 Release Notes.

========================================================

(This update shows what happens if the beta option is not enabled.)

If the option Beta: Use Unicode UTF-8 for worldwide language support is not enabled I cannot reproduce the problem; the call to javac works fine using both JDK 18 and JDK 19:

Beta option not enabled

Note that this works even though the code page in the cmd window is 437, not 65001. Of course there are a couple of significant differences between your environment and mine:

  • You are using Windows 11 and I am using Windows 10.
  • My system locale is English (United States), and I assume that yours is different.

To summarize how to resolve this issue:

  • Unless you have that beta option enabled for some specific reason, consider just disabling it.
  • If you want to keep the option enabled, consider upgrading to Java 19.

========================================================

Update: The following bug was fixed in Java 19:

8272352: Java launcher can not parse Chinese character when system locale is set to UTF-8 #530

Although that bug fix specifically relates to file names passed to java, I think it probably explains why the OP's problem with javac is also resolved in Java 19.

skomisa
  • 16,436
  • 7
  • 61
  • 102
  • as I already had commented (now deleted), I could reproduce the error with Java 18, but not with Java 19 (my default): [screenshot](https://i.stack.imgur.com/X5LlT.png) – user16320675 Sep 16 '22 at 21:02
  • @user16320675 Right - I checked Java 19 based on your comment. (No idea why you would delete such a helpful comment!) It looks like 19 resolves the Unicode issue, but not the long Windows file name issue. – skomisa Sep 16 '22 at 22:35
  • @user16320675 [1] You are correct. I had introduced an unrelated error by omitting the `Example` directory from the file path, and I have corrected my answer accordingly. As you have shown, it looks like Java 19 solves the OP's problem completely. [2] I'm not sure if [enabling the Beta: Use Unicode UTF-8 for worldwide language support](https://stackoverflow.com/q/56419639/2985643) option is also required, but I have it set. Regardless, I don't have your JAVA_TOOLS_OPTIONS setting, and I think it may be redundant if that beta setting is enabled. – skomisa Sep 17 '22 at 16:34
  • @skomisa So there's no way to solve this error on JDK 17 or 18? – Otávio Augusto Silva Sep 19 '22 at 12:54
  • @OtávioAugustoSilva As I mentioned in my answer, you can resolve the issue if you are using JDK 18 by NOT enabling the *Beta: Use Unicode…* option. At least that worked for me - see the final screen shot above. Do you have that option enabled for a specific reason? If not, try disabling it to see if that fixes the problem – skomisa Sep 19 '22 at 19:43
  • @skomisa I can't disable the Beta option otherwise it screws with other programs/compilers because of the `á` character in my name. I will try disabling it again, and see what changes – Otávio Augusto Silva Sep 19 '22 at 22:48
  • @OtávioAugustoSilva OK, understood. In that case I don't see any solution except using JDK 19. Or at least try it just to verify that it fixes your problem, since your environment is not identical to mine. Also, what is your Windows _Current system locale_, as shown in Control Panel's "Region" settings? Mine is _(English) United States_. – skomisa Sep 20 '22 at 03:17
  • @skomisa Just tested, OpenJDK 19 does fix the problem and it compiles and runs Java code normally. Sadly, I still can't play Minecraft, but unfortunately we can't have everything :( – Otávio Augusto Silva Sep 20 '22 at 14:00
  • @OtávioAugustoSilva That's good. [JDK 19 was released today for general availability](https://openjdk.org/projects/jdk/19/), so that seems like the best (only?) approach for your situation. I never did find the bug associated with the fix though. – skomisa Sep 20 '22 at 19:47
  • @OtávioAugustoSilva I updated my answer to include the bug fix that (probably) resolves the issue with JDK 19: [JDK-8272352](https://bugs.openjdk.org/browse/JDK-8272352) – skomisa Sep 20 '22 at 20:40
  • @skomisa Thank you for your answer, and I forgot about your question, sorry. My locale is _Portuguese (Brazil)_ – Otávio Augusto Silva Sep 20 '22 at 21:08