4

Here is my code:

Process p = Runtime.getRuntime().exec(new String[]{"bash","-c",new String(command.getBytes(),"utf-8")});

I found out that there is no use of new String(command.getBytes(),"utf-8").

How can I to set charset?

My app is a spring boot application.

The detail command is

./xxx.jar --execute "select * from xxx where a = `我`" 

When I execute the command directly in the shell, it runs well, but the java code gets garbled.

I set -Dfile.encoding=UTF-8,but it is no use for me. Why?

Baby.zhou
  • 561
  • 1
  • 4
  • 25
  • 1
    Does this here https://perlgeek.de/en/article/set-up-a-clean-utf8-environment help in any way? Meaning: you sure that the bash side of things works as you expect it to? – GhostCat Dec 12 '16 at 09:31
  • @GhostCat,I run the app in `docker`,the command runs well by shell directly,but code garbled. – Baby.zhou Dec 12 '16 at 09:41
  • 1
    Possible duplicate of [Setting the default Java character encoding?](http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding) – Serg M Ten Dec 12 '16 at 09:46
  • Just in case you don't get a good answer here; there would be a simple workaround: you could write the SQL statement into some file (which should definitely work with UTF-8) and have your application read that file; instead of passing stuff directly. But of course, that is more like a dirty little hack. – GhostCat Dec 12 '16 at 09:46
  • @Baby.zhou set locale variable in shell environment correctly. – vahid Dec 12 '16 at 09:50
  • @SergioMontoro,I set `-Dfile.encoding=UTF-8`,but it is no use. – Baby.zhou Dec 12 '16 at 10:19
  • You may try to call [locale](https://www.cyberciti.biz/faq/how-to-set-locales-i18n-on-a-linux-unix/) from exec() but since the problem is within the string of your SQL query I think you'd better encode the string with escaped character codes like "\\uXXXX" discussion [here](http://stackoverflow.com/questions/13287145/mysql-querying-for-unicode-entities) Also more info about encoding SQL at [OWASP](https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet) – Serg M Ten Dec 12 '16 at 14:10

1 Answers1

0

I found out that there is no use of new String(command.getBytes(),"utf-8").

This isn't accurate. Below is an example showing different character sets (ASCII and UTF-8) to run the same command using exec(), and the output is pretty clearly affected by the character set.

This program:

  • takes a single input parameter,
  • runs touch to create two files at /tmp/charset-test/ using that input value in the filename
  • further, if the input is a UTF-8 value, it should create a file with the UTF-8 value in the filename
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;

public class CharsetTest {

    public static void main(String[] args) throws IOException {
        String input = args[0];
        System.out.println("input: " + input);

        Charset[] charsets = {StandardCharsets.US_ASCII, StandardCharsets.UTF_8};
        for (Charset charset : charsets) {
            String command = "touch /tmp/charset-test/" + input + "-" + charset.toString() + ".txt";
            System.out.println("command: " + command);

            // this is identical to your code, but:
            //  - use Charsets instead of "utf-8" so I can interate; "utf-8" also works
            //  - skip assigning to "Process p"
            Runtime.getRuntime().exec(new String[]{
                    "bash", "-c", new String(command.getBytes(), charset)
            });
        }
    }
}

If I run with ASCII input "simple", it creates two files, one for each charset: "simple-US-ASCII.txt" and "simple-UTF-8.txt". This isn't all that interesting, but shows both charsets work normally with basic (ASCII) input.

% rm /tmp/charset-test/*.txt && java CharsetTest.java simple
input: simple
command: touch /tmp/charset-test/simple-US-ASCII.txt
command: touch /tmp/charset-test/simple-UTF-8.txt

% ls /tmp/charset-test
simple-US-ASCII.txt simple-UTF-8.txt

If input changes to "我", then the ASCII charset handling results in the same "garbled" output you describe ("���-US-ASCII.txt"), whereas the UTF-8 version looks good ("我-UTF-8.txt"):

% rm /tmp/charset-test/*.txt && java CharsetTest.java 我    
input: 我
command: touch /tmp/charset-test/我-US-ASCII.txt
command: touch /tmp/charset-test/我-UTF-8.txt

% ls /tmp/charset-test
我-UTF-8.txt     ���-US-ASCII.txt

All of this to say: your code looks fine, it's doing the right thing to pass the charset to the Runtime.exec() call. I can't say what the proper solution would be, but it's likely something with the environment (not your code).

Kaan
  • 5,434
  • 3
  • 19
  • 41