11

I am running a small Java application on an embedded Linux platform. After replacing the Java VM JamVM with OpenJDK, file names with special characters are not stored correctly. Special characters like umlauts are replaced by question marks.

Here is my test code:

import java.io.File;
import java.io.IOException;

public class FilenameEncoding
{

        public static void main (String[] args) {
                String name = "umlaute-äöü";
                System.out.println("\nname = " + name);
                System.out.print("name in Bytes: ");
                for (byte b : name.getBytes()) {
                        System.out.print(Integer.toHexString(b & 255) + " ");
                }
                System.out.println();

                try {
                        File f = new File(name);
                        f.createNewFile();
                } catch (IOException e) {
                        e.printStackTrace();
                }
        }

}

Running it gives the following output:

name = umlaute-???
name in Bytes: 75 6d 6c 61 75 74 65 2d 3f 3f 3f

and file called umlaute-??? is created.

Setting the properties file.encoding and sun.jnu.encoding to UTF-8 gives the correct strings in the terminal, but the created file is still umlaute-???

Running the VM with strace, I can see the system call

open("umlaute-???", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0666) = 4

This shows, that the problem is not a file system issue, but one of the VM.

How can the encoding of the file name be set?

Roland Brand
  • 111
  • 1
  • 1
  • 3
  • Please go through the link explaining setting the encoding.. http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding – Phani Apr 11 '12 at 12:51
  • Setting file.encoding does not help. It only affects the file content, but not the file name. – Roland Brand Apr 11 '12 at 12:56
  • This might help you a bit.. http://stackoverflow.com/questions/1184176/how-can-i-safely-encode-a-string-in-java-to-use-as-a-filename – Phani Apr 11 '12 at 12:59
  • 1
    have you checked that the underlaying file system even support UTF-8? – Kru Apr 11 '12 at 13:57
  • I agree with Kru, you should make sure the file system allows this. I've ran into the same problem with a RedHat distro, even though the locale was set to english and UTF-8. In my case, the easiest solution was to rename the files, but maybe it's not the same for you. – Sorin Apr 11 '12 at 14:09
  • 1
    I am sure, that it is not a file system issue. I can create those files on the command line. Also, another VM, JamVM could create and handle such files correctly. strace shows, that the call to open() already contains the question marks instead of the ä, ö and ü. – Roland Brand Apr 11 '12 at 14:18
  • (Standard rant copied from another answer.) Do **not** use `new String(bytes[])`, do **not** use `string.getBytes()`, do **not** use new `InputStreamReader(InputStream)`, and do **not** use `new OutputStreamWriter(OutputStream)`. They use the platform default encoding, which is equivalent to depending on **a global variable with an essentially random value**. *Specify* the encoding you are using unless you want your program to break randomly in an inexplicable manner at some unpredictable point in the future on some other platform or for some other user. – Christoffer Hammarström Apr 12 '12 at 15:15

3 Answers3

4

If you are using Eclipse, then you can go to Window->Preferences->General->Workspace and select the "Text file encoding" option you want from the pull down menu. By changing mine around, I was able to recreate your problem (and also change back to the fix).

If you are not, then you can add an environmental variable to windows (System properties->Environment Variables and under system variables you want to select New...) The name should be (without quotes) JAVA_TOOL_OPTIONS and the value should be set to -Dfile.encoding=UTF8 (or whatever encoding will get yours to work.

I found the answer through this post, btw: Setting the default Java character encoding?

Linux Solutions

-(Permanent) Using env | grep LANG in the terminal will give you one or two responses back on what encoding linux is currently setup with. You can then set LANG to UTF8 (yours might be set to ASCII) in the /etc/sysconfig i18n file (I tested this on 2.6.40 fedora). Bascially, I switched from UTF8 (where I had odd characters) to ASCII (where I had question marks) and back.

-(on running the JVM, but may not fix the problem) You can start the JVM with the encoding you want using java -Dfile.encoding=**** FilenameEncoding Here is the output from the two ways:

[youssef@JoeLaptop bin]$ java -Dfile.encoding=UTF8 FilenameEncoding

name = umlaute-הצ�
name in Bytes: 75 6d 6c 61 75 74 65 2d d7 94 d7 a6 ef bf bd 
UTF-8
UTF8

[youssef@JoeLaptop bin]$ java FilenameEncoding

name = umlaute-???????
name in Bytes: 75 6d 6c 61 75 74 65 2d 3f 3f 3f 3f 3f 3f 3f 
US-ASCII
ASCII

Here is some references for the linux stuff http://www.cyberciti.biz/faq/set-environment-variable-linux/

and here is one about the -Dfile.encoding Setting the default Java character encoding?

Community
  • 1
  • 1
Youssef G.
  • 617
  • 4
  • 10
  • I checked the encoding of the file name in the compiled .class-File. There it is correct. The same .class-File works on the desktop linux, but not on the embedded one. – Roland Brand Apr 11 '12 at 14:21
  • can you give more information on the Linux you are using? The idea is the same, you just need to adapt it to the program/OS that starts the JVM. – Youssef G. Apr 11 '12 at 15:13
  • It is a kernel 2.6.30 running on an ARM v5 processor (Atmel AT91SAM9G20). An interesting fact is that JamVM could handle such file names, but the OpenJDK can not. What OS features does the OpenJDK depend on? – Roland Brand Apr 12 '12 at 06:57
  • updated my answer! hope that helps. You can also use an input and output stream, but I think your issue is that Linux is setup with a LANG that doesn't support your characters. I could only get mine to write the file correctly if I was in the correct LANG btw. Otherwise I wouldn't get ??? (although the terminal showed ???), but I would get this: ×צ� – Youssef G. Apr 12 '12 at 13:57
2

I know it's an old question but I had the same problem. All of the mentioned solutions did not work for me, but the following solved it:

  • Source encoding to UTF8 (project.build.sourceEncoding to UTF-8 in maven properties)
  • Program arguments: -Dfile.encoding=utf8 and -Dsun.jnu.encoding=utf8
  • Using java.nio.file.Path instead of java.io.File
Stefan A
  • 121
  • 7
0

Your problem is that javac is expecting a different encoding for your .java-file than you have saved it as. Didn't javac warn you when you compiled?

Maybe you have saved it with encoding ISO-8859-1 or windows-1252, and javac is expecting UTF-8.

Provide the correct encoding to javac with the -encoding flag, or the equivalent for your build tool.

Christoffer Hammarström
  • 27,242
  • 4
  • 49
  • 58