My team has a few java ETL tools on a server. One of the tools has the following code:
StringBuilder responseContents = new StringBuilder();
byte[] buffer = new byte[2048];
int read = 0;
try
{
if (zipInputStream.getNextEntry() != null)
{
while ((read = zipInputStream.read(buffer, 0, 2048)) >= 0)
{
responseContents.append(new String(buffer,
0,
read,
StandardCharsets.UTF_8));
}
}
}
The zipInputStream contains JSON encoded in UTF-8. In java strings are encoded in UTF-16. Originally, the StandardCharsets.UTF_8 was not passed to the string constructor. We ran into a situation where the JSON had some Korean characters, and on my machine (without the explicit charsets argument) the encoding scheme of the bytes was assumed correctly, but when the same jar executable was run on the server it was not assumed correctly and the Korean characters were converted incorrectly. Neither my machine nor the server had the NLS_LANG environment variable set, and both machines were running the same java version. What variables change the "default/assumed" byte array encoding in java?