What sets the default byte array encoding in Java?

Question

My team has a few java ETL tools on a server. One of the tools has the following code:

        StringBuilder responseContents = new StringBuilder();
        byte[] buffer = new byte[2048];
        int read = 0;
        try
        {
            if (zipInputStream.getNextEntry() != null)
            {
                while ((read = zipInputStream.read(buffer, 0, 2048)) >= 0)
                {
                    responseContents.append(new String(buffer,
                                                       0,
                                                       read,
                                                       StandardCharsets.UTF_8));

                }
            }
        }

The zipInputStream contains JSON encoded in UTF-8. In java strings are encoded in UTF-16. Originally, the StandardCharsets.UTF_8 was not passed to the string constructor. We ran into a situation where the JSON had some Korean characters, and on my machine (without the explicit charsets argument) the encoding scheme of the bytes was assumed correctly, but when the same jar executable was run on the server it was not assumed correctly and the Korean characters were converted incorrectly. Neither my machine nor the server had the NLS_LANG environment variable set, and both machines were running the same java version. What variables change the "default/assumed" byte array encoding in java?

@ElliottFrisch My machine runs windows 10, the server is running windows server 2012 R2 inside a VMware cluster. — Spencer Yu, Apr 12 '19 at 00:10
But you have fixed the bug now. If you need to fix some data, you could start by writing a program that tells you what the default Charset is and presumably was. — Tom Blodget, Apr 12 '19 at 10:38

What sets the default byte array encoding in Java?

0 Answers0