Java String decoding differs between IDE and runnable JAR

Question

In my program I encrypt a string and decrypt it again. When I run it in IntelliJ, it works fine, but when I build a jar, some characters don´t decrypt correctly. E.g. "ä" becomes "Ã¤". I learned that happens when text is encoded as UTF-8 and decoded as ISO 8859-1. (But my file is encoded as UTF-8 already)

Can anybody explain why there is a difference in encryption/decoding between running the program in IntelliJ and running it as a jar?

package main;

import javax.crypto.Cipher;
import javax.crypto.spec.SecretKeySpec;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.util.Arrays;
import java.util.Base64;

public class Main {

    public static void main(String[] args) throws Exception {
        SecretKeySpec password = createKey("safePassword");

        String message = "hello ä ö ü ß";

        String encryptedMessage = encrypt(message, password);
        //this output is the same in IntelliJ and as a jar
        System.out.println(encryptedMessage);

        byte[] decryptedBytes = decrypt(encryptedMessage, password);
        //this output gets messed up when I run it as a jar but not in Intellij
        System.out.println(new String(decryptedBytes));
        //this output works both ways
        System.out.println(new String(decryptedBytes, StandardCharsets.UTF_8));
    }

    public static String encrypt(String message, SecretKeySpec key) throws Exception {
        Cipher cipher = Cipher.getInstance("AES");
        cipher.init(Cipher.ENCRYPT_MODE, key);
        byte[] encrypted = cipher.doFinal(message.getBytes(StandardCharsets.UTF_8));
        return Base64.getEncoder().encodeToString(encrypted);
    }

    public static byte[] decrypt(String encryptedMessage, SecretKeySpec key) throws Exception {
        byte[] message = Base64.getDecoder().decode(encryptedMessage);
        Cipher cipher = Cipher.getInstance("AES");
        cipher.init(Cipher.DECRYPT_MODE, key);
        return cipher.doFinal(message);
    }

    public static SecretKeySpec createKey(String key) throws Exception {
        byte[] keyBytes = key.getBytes(StandardCharsets.UTF_8);
        MessageDigest sha = MessageDigest.getInstance("SHA-256");
        keyBytes = sha.digest(keyBytes);
        keyBytes = Arrays.copyOf(keyBytes, 16);
        return new SecretKeySpec(keyBytes, "AES");
    }

}

Output in IntelliJ:

8nno4HGKG4/Ni/Sxun+s3roOAaav+eXT4kd0ivgZFBA=
hello ä ö ü ß
hello ä ö ü ß

Output from jar:

8nno4HGKG4/Ni/Sxun+s3roOAaav+eXT4kd0ivgZFBA=
hello Ã¤ Ã¶ Ã¼ Ã?
hello ä ö ü ß

score 0 · Answer 1 · answered Apr 04 '21 at 11:23

0

    System.out.println(new String(decryptedBytes));

Your 'bytes' are a UTF-8 representation.

The above line constructs a string interpreting the bytes using the platform standard charset, which is not necessarily UTF-8.

Therefore the resulting string is garbage.

Documentation link

answered Apr 04 '21 at 11:23

user15187356

807
3
3

I understand, that makes sense. Thank you. Nevertheless, that doesn´t explain the difference between running it in IntelliJ and as a jar. Why would they use different standard charsets? – Silas Apr 04 '21 at 11:28
It's under user control. I use Linux; my standard locale is UTF-8. See also [this question](https://stackoverflow.com/questions/8809098/how-do-i-set-the-default-locale-in-the-jvm) – user15187356 Apr 04 '21 at 11:32

Mark Rotteveel · Answer 2 · 2021-04-04T11:50:31.587

You should not rely on the default character set like you're doing with new String(decryptedBytes). The problem is that depending on the OS, and the exact way Java is started, your application may use a different default character set. In other words, you should always use String.getBytes(Charset charset) or String.getBytes(String charsetName), and new String(byte[] bytes, Charset charset) or new String(byte[] bytes, String charsetName).

On Windows, some ways of launching a Java application will default to a simple single-byte character set, and some other ways default to UTF-8. For example, on my machine, it will default to Cp1252 (windows-1252) in some situations (e.g. using java -jar from the command prompt with Java 8, while using java -jar with Java 16 will use UTF-8). However, the way IntelliJ launches Java will make it default to UTF-8 even when using Java 8. Similar things apply for Linux, though there it is more common that the default locale uses UTF-8.

Java String decoding differs between IDE and runnable JAR

2 Answers2