2

It seems like java fails to correctly encode Strings when ProcessBuilder or Runtime.exec pass them along to the process they spawn, even with -Dfile.encoding set - for reasons that I don't understand. This means high codepoint characters (Chinese, Japanese etc) don't passed along to the child process.

As a simple example, compile the following two test classes, substituting in your own jre in Test1 and whatever file path you like in Test2:

import java.io.IOException;
import java.nio.charset.Charset;

public class Test1 {
    public static void main(String[] args) throws IOException {
        String s = "因";
        System.out.println(bytesToHex(s.getBytes(Charset.forName("UTF-8"))));
        Runtime.getRuntime().exec(new String[]{"C:\\Program Files\\Java\\jdk1.6.0_45\\bin\\java.exe", "-cp", ".", "Test2", s});
    }

    public static String bytesToHex(byte[] bytes) {
        char[] hexArray = "0123456789ABCDEF".toCharArray();
        char[] hexChars = new char[bytes.length * 2];
        for ( int j = 0; j < bytes.length; j++ ) {
            int v = bytes[j] & 0xFF;
            hexChars[j * 2] = hexArray[v >>> 4];
            hexChars[j * 2 + 1] = hexArray[v & 0x0F];
        }
        return new String(hexChars);
    }
}

and

import java.io.FileWriter;
import java.io.IOException;
import java.nio.charset.Charset;

public class Test2 {
    public static void main(String[] args) throws IOException {
        FileWriter w = null;
        try {
            w = new FileWriter("<some directory>\\testoutput.txt");
            w.write(Test1.bytesToHex(args[0].getBytes(Charset.forName("UTF-8"))));
        } finally {
            if (w != null) w.close();
        }
    }
}

Then run Test1:

java -Dfile.encoding=UTF-8 Test1

Observe that Test1 prints out "E59BA0", whilst Test2 writes "3F" ('?') to file.

Can anyone explain why this is, and what the correct way to accomplish what I want to accomplish is?

fragorl
  • 1,698
  • 15
  • 19
  • Looks like windows-related problem, as it works perfectly on my Ubuntu. – AlexW Apr 17 '15 at 09:54
  • I think the answer is similar to this one: [what is the character encoding used in eclipse vm arguement?](http://stackoverflow.com/questions/32587876/what-is-the-character-encoding-used-in-eclipse-vm-arguement/35358497) – Beck Yang Feb 26 '16 at 15:46
  • @fragorl did you find any solution to this? – Panayotis Sep 23 '19 at 12:12

0 Answers0