How to encode byte array to String in mixed encodings in Java?

Question

I have a byte array from other system, the string should be mixed with English, Japanese and Chinese characters after encoding it, how can I process it? thanks!

    //the byte[] represents "C注ファイル         PARM 年月日输入不正确         入力文字列"
    byte[] buf = new byte[] { 64, 64, 64, 64, 64, -61, 14, 73, 68, 67, -97,
            67, 71, 67, -126, 67, -81, 15, 64, 64, 64, 64, 64, 64, 64, 64,
            64, -41, -63, -39, -44, 64, 14, 82, -23, 90, -63, 84, -44, 85,
            -29, 84, -22, 73, -70, 91, -98, 84, -74, 15, 64, 64, 64, 64,
            64, 64, 64, 64, 64, 14, 70, 101, 69, -9, 69, -54, 72, -14, 75,
            -76, 15, 64, 64, 64, 64, 64, 64, 64, 64, 64 };
    String japaneseStr = new String(buf,"cp939");// convert to japanese
    System.out.println(japaneseStr);//output:"     C注ファイル         PARM 衷扞唖詑煤証昿翰         入力文字列         "

    String chineseStr = new String(buf,"cp935"); // convert to chinese
    System.out.println(chineseStr); //output:"    C堡ファイル         PARM 年月日输入不正确         ㄅ㈦⑹绑兜         "
    //"注ファイル"       is japanese
    //"年月日输入不正确"   is chinese
    //"入力文字列"       is japanese
    //i want to get the result is "     C注ファイル         PARM 年月日输入不正确         入力文字列         "

Do you want to encode it (like Base64) or decode it (like UTF-8)? — Thilo, Nov 19 '14 at 03:24
Process it how? What do you have so far? And why did you feel it necessary to spew random characters in to the question before posting? — Paul Richter, Nov 19 '14 at 03:25

icza · Answer 1 · 2014-11-19T05:00:52.170

3

The language where the characters belong to doesn't matter. What matters is how the original String was encoded to the result byte array.

You can use the following constructors of String to decode a byte array to String:

String(byte[] bytes, String charsetName)

String(byte[] bytes, Charset charset)

You can pass the byte array to the constructor of String and provide the charset (either by name or as a Charset object, see constants in StandardCharsets).

So for example if the original String was encoded using UTF-8 character encoding, you can decode it like this:

String str = new String(source, "UTF-8");

Or:

String str = new String(source, StandardCharsets.UTF_8);

Your example:

If indeed your source would be encoded using UTF-8, it would look like this:

byte[] source = {-26, -75, -117, -24, -81, -107, -26, -107, -120, -26, -98, -100,
    97, 98, 99, 100, 101, -26, -106, -80, -25, -108, -97, -25, -108, -93, -25,
    -82, -95, -25, -112, -122, -29, -126, -73, -29, -126, -71, -29, -125, -122,
    -29, -125, -96}

The following code:

String str = new String(source, StandardCharsets.UTF_8);
System.out.println(str);

Prints:

测试效果abcde新生産管理システム

edited Nov 19 '14 at 05:00

answered Nov 19 '14 at 03:51

icza

389,944
63
907
827

thank for your reply!but the source with different coding format！！it is not encoded by utf-8 – lane_yang Nov 19 '14 at 04:08
1

You will have to find out what encoding was used for the source. – Jeff Olson Nov 19 '14 at 04:16
Either ask the person who is providing the source (the easiest way), or maybe this question will help: http://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream – Jeff Olson Nov 19 '14 at 04:22
@lane_yang As mentioned by Jeff Olson, you have to know which encoding was used, and you have to use the same encoding to decode the byte array. If you don't know it, ask the source (server you get the byte array from, or its developer/admin). Or you can experiment and try multiple encodings until you see the desired result. – icza Nov 19 '14 at 05:10
In the other systems, the input maybe is Japanese and Chinese characters,finally, they are stored as a byte array which was encoded different coding format. – lane_yang Nov 19 '14 at 06:55
@lane_yang Again, it's not the language that matters but the encoding that is used to create the byte array from the string. The same encoding has to be used for decoding and all will be good. If for example UTF-8 is used everywhere, it doesn't matter if the string contains Japanese or Chinese characters, after converting it to byte array and back to string the decoded string will also contain the exact same Japanese and Chinese characters. – icza Nov 19 '14 at 06:59

How to encode byte array to String in mixed encodings in Java?

1 Answers1