I have this problem that has been dropped on me, and have been a couple of days of unsuccessful searches and workaround attempts.
I have now an internal java swing program distributed by jnlp/webstart, on osx and windows computers, that, among other things, downloads some files from WebDav.
Recently, on a test machine with OSX 10.8 and Java 7, filenames and directory names with accented characters started having those replaced by question marks.
No problem on OSX with versions of Java before 7.
example :
XXXYYY_è_ABCD/
becomes
XXXYYY_?_ABCD/
using java.text.Normalizer (NFD, NFC, NFKD, NFKC) on the original string, the result is different but still wrong :
XXXYYY_e?_ABCD/
or
XXXYYY_e_ABCD/
I know, from correspondence between [andrew.brygin at oracle.com] and [mik3hall at gmail.com] that
Yes, file.encoding is set based on the locale that the jvm is running on, and if you run your java vm in xxxx.UTF-8 locale, the file.encoding should be UTF-8, set to MacRoman will be problematic. So I believe Oracle/OpenJDK7 behaves correctly. That said, as Andrew Thompson pointed out, if all previous Apple JDK releases use MacRoman as the file.encoding for english/UTF-8 locale, there is a "compatibility" concern here, it might worth putting something in the release note to give Oracle/OpenJDK MacOS user a heads up.
from Joni Salonen blog (java-and-file-names-with-invalid-characters) i know that :
You probably know that Java uses a “default character encoding” to convert binary data to Strings. To read or write text using another encoding you can use an InputStreamReader or OutputStreamWriter. But for data-to-text conversions deep in the API you have no choice but to change the default encoding.
and
What about file.encoding?
The file.encoding system property can also be used to set the default character encoding that Java uses for I/O. Unfortunately it seems to have no effect on how file names are decoded into Strings.
executing locale from inside the jnlp invariabily prints
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
the most similar problem on stackoverflow with a solution is this : encoding-issues-on-java-7-file-names-in-os-x
but the solution is wrapping the execution of the java program in a script with
#!/bin/bash
export LC_CTYPE="UTF-8" # Try other options if this doesn't work
exec java your.program.Here
but I don't think this option is available to me because of the webstart, and I haven't found any way to set the LC_CTYPE environment variable from within the program.
Any solutions or workarounds?
P.S. :
If we run the program directly from shell, it writes the file/directory correctly even on OSX 10+Java 7. The problem appears only with the combination of JNLP+OSX+Java7