We are running an Java web application on a linux server with default locale "POSIX". Some of our clients upload files that contain non-ascii characters in file names. We can retain those non-ascii characters in Java by unicode, but they are lost (saved file name will contain many question mark) after we actually save uploaded file into the file system, because the file system's default locale doesn't support non-ascii character. Is there any way to specify a char set for file name (not content) before save a file in Java?
2 Answers
The portable Java API does not have a concept of a file system character encoding, as that wouldn't be portable: Windows e.g. saves file names as unicode no matter the locale. On Linux, however, the LC_CTYPE
facet of your locale determines the encoding of the file system. So by exporting LC_CTYPE=en_US.utf8
or similar to the environment before you launch your Java application, your application will use that for file name handling.
Also see file.encoding has no effect, LC_ALL environment variable does it which talks about some of the internals behind this conversion.
-
I have been beating my head against the wall, thanks for this! – eric Jan 09 '15 at 19:55
If the files are entirely under the control of your app rather than being uploaded for another app to use, then I would consider doing your own encoding/decoding of the file names before saving them, e.g. URLEncoder.encode(filename, "UTF-8")
to map a user-supplied name to one you can use on disk and URLDecoder.decode(encodedName, "UTF-8")
vice versa.

- 120,891
- 16
- 170
- 183