Passing command line unicode argument to Java code

Question

I have to pass command line argument which is Japanese to Java main method. If I type Unicode characters on command-line window, it displays '?????' which is OK, but the value passed to java program is also '?????'. How do I get the correct value of argument passed by the command window? Below is sample program which writes to a file the value supplied by command line argument.

public static void main(String[] args) {
        String input = args[0];
        try {
            String filePath = "C:/Temp/abc.txt";
            File file = new File(filePath);
            OutputStream out = new FileOutputStream(file);
            byte buf[] = new byte[1024];
            int len;
            InputStream is = new ByteArrayInputStream(input.getBytes());
            while ((len = is.read(buf)) > 0) {
                out.write(buf, 0, len);
            }
            out.close();
            is.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

Does it change when you change the charset of the console window? Which operating system? — Andreas, Oct 05 '11 at 11:38
Changing charset of the console window doesn't help. I am using windows 2000 — Pankaj Agrawal, Oct 05 '11 at 13:19

score 15 · Answer 1 · answered Oct 05 '11 at 12:20

Unfortunately you cannot reliably use non-ASCII characters with command-line apps that use the Windows C runtime's stdlib, like Java (and pretty much all non-Windows-specific scripting languages really).

This is because they read their input and output using a locale-specific code page by default, which is never a UTF, unlike every other modern OS which uses UTF-8.

Whilst you can change the code page of a terminal to something else using the chcp command, the support for the UTF-8 encoding under chcp 65001 is broken in a few ways that are likely to trip apps up fatally.

If you only need Japanese you could switch to code page 932 (similar to Shift-JIS) by setting your locale (‘language for non-Unicode applications’ in the Regional settings) to Japan. This will still fail for characters that aren't in that code page though.

If you need to get non-ASCII characters through the command line reliably on Windows, you need to call the Win32 API function GetCommandLineW directly to avoid the encode-to-system-code-page layer. Probably you'd want to do that using JNA.

score 4 · Answer 2 · edited Jun 12 '20 at 10:42

Unfortunately the standard Java launcher has a known and long-living bug in handling Unicode command line arguments on Windows. Maybe on some other platforms too. For Java 7 update 1 it was still in place.

If you feel good at programming in C/C++, you may try writing your own launcher. Some specialized launcher might be not a big deal... Just see the initial example at JNI Invocation API page.

Another possibility is to use a combination of a Java wrapper and a temporary file for passing Unicode parameters to a Java app. See my blog Java, Xalan, Unicode command line arguments... for more comments and the wrapper code.

score 3 · Answer 3 · answered Jan 01 '19 at 12:37

https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8

With insider build 17035 and the April 2018 update (nominal build 17134) for Windows 10, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox appeared for setting the locale code page to UTF-8

This actually works for me. Without it, no matter what I set chcp to or what I supplied as -Dsun.jnu.encoding, the argument was always garbled.

I had a test class that would just print the argument that is passed to it:

Before:

> java test "üůßβαa"
üußßaa

Interesting that with sun.jnu.encoding=Cp1252, U+03B2 (beta, β) will become a German sharp s (ß) and the Czech ů will become a plain u.

> chcp 65001
Active code page: 65001
> java test "üůßβαa"
uaa

Hmm…

> java -Dsun.jnu.encoding=utf-8 test "üůßβαa"
?u??aa

This is not better. And it becomes worse when CJK characters come into play, for example U+4E80 (亀):

> java test "üůßβαa亀"
uaa?
Exception in thread "main" java.nio.file.InvalidPathException: Illegal char <?> at index 6: uaa?
        at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
        at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
        at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
        at sun.nio.fs.WindowsPath.parse(Unknown Source)
        at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
        at java.nio.file.Paths.get(Unknown Source)
        at test.urify(test.java:33)
        at test.urify(test.java:43)
        at test.main(test.java:13)

The class that I used not only prints its argument, it also tries to convert it to a file: URI, and it crashed.

Setting the Windows locale to UTF-8 with the approach quoted above solved this issue.

Unfortunately, it didn’t fix encoding issues with arguments passed to another Java program, the XProc processor XML Calabash. A sample pipeline that takes a value from the command line and inserts it as an attribute into a document yielded this mojibake:

> calabash.bat Untitled3.xpl foo='rαaßβöů亊'
<doc xmlns:c="http://www.w3.org/ns/xproc-step" foo="rÎ±aÃŸÎ²Ã¶Å¯äºŠ">Hello world!</doc>

Adding -Dsun.jnu.encoding=UTF-8 to the Java invocation fixed this:

<doc xmlns:c="http://www.w3.org/ns/xproc-step" foo="rαaßβöů亊">Hello world!</doc>

For completeness, before switching the Windows locale to UTF-8, depending on whether the code page was 1252 or 65001, the invocation yielded different variations of mojibake that -Dsun.jnu.encoding=UTF-8 couldn’t fix.

So the beta feature to switch the Windows locale finally seems to solve this issue. Some applications might need an additional -Dsun.jnu.encoding=UTF-8, for reasons not thoroughly researched.

This doesn’t solve your years-old issue with Windows 2000. But maybe you have switched to Windows 10 in the meantime.

Ah, btw, I ran your program and it works with the Windows UTF-8 locale setting.

> java test t=r_ä亀
> type C:\Temp\abc.txt
t=r_ä亀

score 0 · Answer 4 · answered Jan 29 '17 at 16:58

You can use JNA to get that, here's copy-paste from my code:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.apache.log4j.Logger;

import com.sun.jna.Native;
import com.sun.jna.Pointer;
import com.sun.jna.WString;
import com.sun.jna.ptr.IntByReference;
import com.sun.jna.win32.StdCallLibrary;

public class OsNativeWindowsImpl implements OsNative {
    private static Logger log = Logger.getLogger(OsNativeWindowsImpl.class);

    private Kernel32 kernel32;
    private Shell32 shell32;

    /**
     * This method will try to solve issue when java executable cannot transfer
     * argument in utf encoding. cyrillic languages screws up and application
     * receives ??????? instead of real text
     */
    @Override
    public String[] getCommandLineArguments(String[] fallBackTo) {
        try {
            log.debug("In case we fail fallback would happen to: " + Arrays.toString(fallBackTo));
            String[] ret = getFullCommandLine();
            log.debug("According to Windows API programm was started with arguments: " + Arrays.toString(ret));

            List<String> argsOnly = null;
            for (int i = 0; i < ret.length; i++) {
                if (argsOnly != null) {
                    argsOnly.add(ret[i]);
                } else if (ret[i].toLowerCase().endsWith(".jar")) {
                    argsOnly = new ArrayList<>();
                }
            }
            if (argsOnly != null) {
                ret = argsOnly.toArray(new String[0]);
            }

            log.debug("These arguments will be used: " + Arrays.toString(ret));
            return ret;
        } catch (Throwable t) {
            log.error("Failed to use JNA to get current program command line arguments", t);
            return fallBackTo;
        }
    }

    private String[] getFullCommandLine() {
        try {
            // int pid = kernel32.GetCurrentProcessId();
            IntByReference argc = new IntByReference();
            Pointer argv_ptr = getShell32().CommandLineToArgvW(getKernel32().GetCommandLineW(), argc);
            String[] argv = argv_ptr.getWideStringArray(0, argc.getValue());
            getKernel32().LocalFree(argv_ptr);
            return argv;
        } catch (Throwable t) {
            throw new RuntimeException("Failed to get program arguments using JNA", t);
        }
    }

    private Kernel32 getKernel32() {
        if (kernel32 == null) {
            kernel32 = (Kernel32) Native.loadLibrary("kernel32", Kernel32.class);
        }
        return kernel32;
    }

    private Shell32 getShell32() {
        if (shell32 == null) {
            shell32 = (Shell32) Native.loadLibrary("shell32", Shell32.class);
        }
        return shell32;
    }

}

interface Kernel32 extends StdCallLibrary {
    int GetCurrentProcessId();

    WString GetCommandLineW();

    Pointer LocalFree(Pointer pointer);
}

interface Shell32 extends StdCallLibrary {
    Pointer CommandLineToArgvW(WString command_line, IntByReference argc);
}

In addition to well-known log4j this code also depends on

<dependency>
    <groupId>net.java.dev.jna</groupId>
    <artifactId>jna</artifactId>
    <version>4.3.0</version>
</dependency>

I am able to add jna jar into my project but I am not able to find OsNative in your coding. please help. — senderj, Jan 01 '22 at 03:40
@senderj `OsNative` is just an interface that this class implements, you can create it based on the only public method in this class. — Sergey Karpushin, Feb 10 '22 at 03:32

score 0 · Answer 5 · answered Jan 26 '22 at 00:04

I was having lots of problems with accents and Java args, changing the OS locale resolved!

Mob answer did shed some light - https://stackoverflow.com/a/7660695/8806187

Example on Linux Debian to change the locale to pt_BR and charset encoding to ISO-8859-1 (Latin-1 or Windows1252) to accept accents on Java command line arguments:

apt update && apt install -y locales
locale-gen pt_BR 
localedef pt_BR -i pt_BR -f ISO-8859-1

score -1 · Answer 6 · answered Oct 05 '11 at 11:38

-1

The issue is because of your system locale. Change your locale to Japanese and it would work.

Here's how to do this http://www.java.com/en/download/help/locale.xml

answered Oct 05 '11 at 11:38

Mob

10,958
6
41
58

shouldn't we be able to pass any Unicode value whether Japanese or Korean without changing system locale? Right now don't have resource to do it, will give it a shot. – Pankaj Agrawal Oct 05 '11 at 11:55
This is just a workaround for single language. What if person has more than 1 non-English language on his/her computer? If other applications (like notepad) can handle non-english letters that java application must also be able to do it without changing system locale. See answer below http://stackoverflow.com/a/41923480/285060 that will not require to change OS locale – Sergey Karpushin Jan 29 '17 at 17:03

score -3 · Answer 7 · answered Oct 05 '11 at 11:38

-3

Java works internally with Unicode, so when compiling source code files that used a Chinese encoding such as Big5 or GB2312, you need to specify the encoding to the compiler in order to properly convert it to Unicode.

javac -encoding big5 sourcefile.java

or

javac -encoding gb2312 sourcefile.java

Reference: http://www.chinesecomputing.com/programming/java.html

answered Oct 05 '11 at 11:38

Xavjer

8,838
2
22
42

This is completely irrelevant to the question. Question is about invoking application with unicode symbols in args. It is NOT about compiling source code which has unicode symbols in it. – Sergey Karpushin Jan 29 '17 at 06:10

Passing command line unicode argument to Java code

7 Answers7

Linked

Related