https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8
With insider build 17035 and the April 2018 update (nominal build 17134) for Windows 10, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox appeared for setting the locale code page to UTF-8
This actually works for me. Without it, no matter what I set chcp
to or what I supplied as -Dsun.jnu.encoding
, the argument was always garbled.
I had a test class that would just print the argument that is passed to it:
Before:
> java test "üůßβαa"
üußßaa
Interesting that with sun.jnu.encoding=Cp1252, U+03B2 (beta, β) will become a German sharp s (ß) and the Czech ů will become a plain u.
> chcp 65001
Active code page: 65001
> java test "üůßβαa"
uaa
Hmm…
> java -Dsun.jnu.encoding=utf-8 test "üůßβαa"
?u??aa
This is not better. And it becomes worse when CJK characters come into play, for example U+4E80 (亀):
> java test "üůßβαa亀"
uaa?
Exception in thread "main" java.nio.file.InvalidPathException: Illegal char <?> at index 6: uaa?
at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
at sun.nio.fs.WindowsPath.parse(Unknown Source)
at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
at java.nio.file.Paths.get(Unknown Source)
at test.urify(test.java:33)
at test.urify(test.java:43)
at test.main(test.java:13)
The class that I used not only prints its argument, it also tries to convert it to a file: URI, and it crashed.
Setting the Windows locale to UTF-8 with the approach quoted above solved this issue.
Unfortunately, it didn’t fix encoding issues with arguments passed to another Java program, the XProc processor XML Calabash. A sample pipeline that takes a value from the command line and inserts it as an attribute into a document yielded this mojibake:
> calabash.bat Untitled3.xpl foo='rαaßβöů亊'
<doc xmlns:c="http://www.w3.org/ns/xproc-step" foo="rαaßβöů亊">Hello world!</doc>
Adding -Dsun.jnu.encoding=UTF-8
to the Java invocation fixed this:
<doc xmlns:c="http://www.w3.org/ns/xproc-step" foo="rαaßβöů亊">Hello world!</doc>
For completeness, before switching the Windows locale to UTF-8, depending on whether the code page was 1252 or 65001, the invocation yielded different variations of mojibake that -Dsun.jnu.encoding=UTF-8
couldn’t fix.
So the beta feature to switch the Windows locale finally seems to solve this issue. Some applications might need an additional -Dsun.jnu.encoding=UTF-8
, for reasons not thoroughly researched.
This doesn’t solve your years-old issue with Windows 2000. But maybe you have switched to Windows 10 in the meantime.
Ah, btw, I ran your program and it works with the Windows UTF-8 locale setting.
> java test t=r_ä亀
> type C:\Temp\abc.txt
t=r_ä亀