Based on answers to similar questions on SO, it seems that passing Unicode arguments to a Java application has never worked properly. There is no simple solution, but you can resolve this issue using JNA (Java Native Access).
JNA allows you to invoke Windows API methods from Java, without using native
code. So in your Java application you can call Win API methods such as GetCommandLineW()
and CommandLineToArgvW()
directly, to access details about the command line used to invoke your program, including any arguments passed. Both of those methods support Unicode.
The code to do this is not trivial, but not overly complex either. The approach below is based on code by Sergey Karpushin in an answer to Passing command line unicode argument to Java code
For the code to compile you will need a couple of jars: jna.jar and jna-platform.jar. You can get these from the dist directory of the JNA 5.10.0 download, or from Maven.
This approach works both within NetBeans and from the command line on Windows 10, though there are some notable differences:
- From the command line you must call chcp 65001, and also specify -Dfile.encoding=UTF-8 in your java.exe call.
- When extracting the parameters returned by
CommandLineToArgvW()
you may see a difference between the arguments returned within NetBeans and those from the command line. But this is not really an issue since the only argument(s) you are interested in are those at the end, which come after the argument containing your jar file name.
Here's the code:
package chinesearg;
import com.sun.jna.Native;
import com.sun.jna.Pointer;
import com.sun.jna.WString;
import com.sun.jna.ptr.IntByReference;
import com.sun.jna.win32.StdCallLibrary;
import java.util.ArrayList;
import java.util.List;
// Proof of concept application which uses JNA to correctly process command
// line arguments containing Chinese characters using JNA.
//
// Credit to Sergey Karpushin for the approach used in this this code.
// See this SO answer: https://stackoverflow.com/a/41923480/2985643
public class ChineseArg {
private final Kernel32 kernel32 = Native.load("kernel32", Kernel32.class);
private final Shell32 shell32 = Native.load("shell32", Shell32.class);
public static void main(String[] args) {
String test = "\u5973\u58eb";
System.out.println(test); //works
String test2 = "女士2";
System.out.println(test2); //works
System.out.println("args.length=" + args.length);
for (int i=0; i< args.length; i++) {
System.out.println("args[" + i + "] = "+args[i]);
}
String[] params = new ChineseArg().getCommandLineArguments();
if (params == null) {
System.out.println("getCommandLineArguments() returned null.");
} else {
int count = params.length;
System.out.println("Number of params=" + count);
for (int i = 0; i < count; i++) {
System.out.println("params[" + i + "]=" + params[i]);
}
}
}
private String[] getCommandLineArguments() {
System.out.println("Active code page is " + Kernel32.INSTANCE.GetConsoleCP());
String[] ret = getFullCommandLine();
List<String> argsOnly = null;
for (int i = 0; i < ret.length; i++) {
if (argsOnly != null) {
argsOnly.add(ret[i]);
} else if (ret[i].toLowerCase().endsWith(".jar")) {
argsOnly = new ArrayList<>();
}
}
if (argsOnly != null) {
ret = argsOnly.toArray(new String[0]);
}
return ret;
}
private String[] getFullCommandLine() {
IntByReference argc = new IntByReference();
Pointer argv_ptr = shell32.CommandLineToArgvW(kernel32.GetCommandLineW(), argc);
String[] argv = argv_ptr.getWideStringArray(0, argc.getValue());
kernel32.LocalFree(argv_ptr);
return argv;
}
}
interface Kernel32 extends StdCallLibrary {
static Kernel32 INSTANCE = Native.load("kernel32", Kernel32.class, com.sun.jna.win32.W32APIOptions.DEFAULT_OPTIONS);
WString GetCommandLineW();
int GetConsoleCP();
Pointer LocalFree(Pointer pointer);
}
interface Shell32 extends StdCallLibrary {
Pointer CommandLineToArgvW(WString command_line, IntByReference argc);
}
This is sample output when run from the Command Prompt, showing that the first argument ("女士2") is captured correctly:
C:\Users\johndoe>chcp 65001
Active code page: 65001
C:\Users\johndoe>java -Dfile.encoding=UTF-8 -jar "D:\NB126\ChineseArg\dist\ChineseArg.jar" "女士2" "\u5973\u58eb"
女士
女士2
args.length=2
args[0] = ??2
args[1] = \u5973\u58eb
Active code page is 65001
Number of params=2
params[0]=女士2
params[1]=\u5973\u58eb
C:\Users\johndoe>
Notes:
- This code is addressing a limitation in the Windows environment. I don't know what would happen if this code was run on macOS or Linux.
- Although it's against the spirit of your question, there is an alternative approach: pass arguments to the application as escaped Unicode. It's trivial to unescape the data using Apache's StringEscapeUtils.unescapeJava(). If that is feasible there is no need for JNA at all.