4

Consider the following code:

import javax.crypto.Cipher;
import javax.crypto.KeyGenerator;
import javax.crypto.SecretKey;
import javax.crypto.spec.GCMParameterSpec;
import javax.crypto.spec.IvParameterSpec;
import java.security.SecureRandom;

public class AES_Mod_Speed {
    // AES parameters
    private static final int AES_KEY_SIZE = 128; // in bits
    private static final int AES_COUNTER_SIZE = 16; // in bytes
    private static final int GCM_NONCE_LENGTH = 12; // in bytes. 12 is the recommended value.
    private static final int GCM_TAG_LENGTH = 16 * 8; // in bits

    public static void main(String[] args) throws Exception {
        SecureRandom sr = new SecureRandom();

        KeyGenerator kg = KeyGenerator.getInstance("AES");
        kg.init(AES_KEY_SIZE);
        SecretKey key = kg.generateKey();

        byte[] counter = new byte[AES_COUNTER_SIZE];
        Cipher aes_ctr = Cipher.getInstance("AES/CTR/NoPadding");

        byte[] nonce = new byte[GCM_NONCE_LENGTH];
        Cipher aes_gcm = Cipher.getInstance("AES/GCM/NoPadding");

        for (int i = 0; i < 10; i++) {
            sr.nextBytes(counter);
            aes_ctr.init(Cipher.ENCRYPT_MODE, key, new IvParameterSpec(counter));
            speedTest(aes_ctr);
        }

        System.out.println("-----------------------------------------");

        for (int i = 0; i < 10; i++) {
            sr.nextBytes(nonce);
            aes_gcm.init(Cipher.ENCRYPT_MODE, key, new GCMParameterSpec(GCM_TAG_LENGTH, nonce));
            speedTest(aes_gcm);
        }

    }

    private static void speedTest(Cipher cipher) throws Exception {
        byte[] ptxt = new byte[1 << 26];
        long start, end;

        start = System.nanoTime();
        cipher.doFinal(ptxt);
        end = System.nanoTime();


        System.out.printf("%s took %f seconds.\n",
                cipher.getAlgorithm(),
                (end - start) / 1E9);
    }
}

Result (Java 11.0.2):


AES/CTR/NoPadding took 0.259894 seconds.
AES/CTR/NoPadding took 0.206136 seconds.
AES/CTR/NoPadding took 0.247764 seconds.
AES/CTR/NoPadding took 0.196413 seconds.
AES/CTR/NoPadding took 0.181117 seconds.
AES/CTR/NoPadding took 0.194041 seconds.
AES/CTR/NoPadding took 0.181889 seconds.
AES/CTR/NoPadding took 0.180970 seconds.
AES/CTR/NoPadding took 0.180546 seconds.
AES/CTR/NoPadding took 0.179797 seconds.
-----------------------------------------
AES/GCM/NoPadding took 0.961051 seconds.
AES/GCM/NoPadding took 0.952866 seconds.
AES/GCM/NoPadding took 0.963486 seconds.
AES/GCM/NoPadding took 0.963280 seconds.
AES/GCM/NoPadding took 0.961424 seconds.
AES/GCM/NoPadding took 0.977850 seconds.
AES/GCM/NoPadding took 0.961449 seconds.
AES/GCM/NoPadding took 0.957542 seconds.
AES/GCM/NoPadding took 0.967129 seconds.
AES/GCM/NoPadding took 0.959292 seconds.

This is odd, since GCM is almost five times slower than CTR (for encrypting 1<<26 bytes, i.e. 64 MB). Using a speed test via OpenSSL 1.1.1a, I issued the commands openssl speed -evp aes-128-ctr and openssl speed -evp aes-128-gcm, and got the following results:

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-ctr     463059.16k  1446320.32k  3515070.12k  5182218.92k  6063797.59k  6210150.19k
aes-128-gcm     480296.99k  1088337.47k  2531854.17k  4501395.11k  5940079.27k  6087589.89k

One can see that GCM is only marginally slower than CTR, especially for larger plaintexts.

Why is Java implementation of AES-GCM so slower than AES-CTR? Am I missing something?

PS: I used Java JMH for microbenchmarking as well, and the results were similar.

Please also see this answer, where the OP explains how AES performance issues were solved in earlier JDKs.

Sadeq Dousti
  • 3,346
  • 6
  • 35
  • 53
  • Have you eliminated all outliers? E.g. is this a debug build? Are you running the benchmark a few times each before you start recording and averaging? It could be that you are recording an O(1) setup step that won't scale with the size of the encrypted data. – Luke Joshua Park Feb 12 '19 at 23:04
  • @LukeJoshuaPark: Thanks for the prompt response. It's not a debug build, and I run the code 10 times in a for-loop to be sure that the warm-up is no problem. The results were consistent. Please see the modified code. – Sadeq Dousti Feb 12 '19 at 23:38

1 Answers1

9

Here is the same problem as described in this answer.

Encryption method is not called enough times to get JIT compiled. What you see is the result of purely interpreted execution. Try to measure more iterations of encrypting smaller arrays. Or just add the dummy loop to "warm-up" the compiler.

For example, insert the following loop before the main benchmarking loop. It will execute doFinal enough times to make sure it gets compiled.

    // Warm-up
    for (int i = 0; i < 100000; i++) {
        sr.nextBytes(nonce);
        aes_gcm.init(Cipher.ENCRYPT_MODE, key, new GCMParameterSpec(GCM_TAG_LENGTH, nonce));
        aes_gcm.doFinal(new byte[16]);
    }

As soon as JIT compiler does its job, the results of the subsequent benchmark will be much better. In fact, key AES encryption methods are intrinsics in JDK; HotSpot JVM has special implementation for them written in optimized assembly and featuring AVX and AES-NI instruction set.

On my laptop the benchmark became the order of magnitude faster after warm-up:

AES/GCM/NoPadding took 0.108993 seconds.
AES/GCM/NoPadding took 0.089832 seconds.
AES/GCM/NoPadding took 0.063606 seconds.
AES/GCM/NoPadding took 0.061044 seconds.
AES/GCM/NoPadding took 0.073603 seconds.
AES/GCM/NoPadding took 0.063733 seconds.
AES/GCM/NoPadding took 0.058680 seconds.
AES/GCM/NoPadding took 0.058996 seconds.
AES/GCM/NoPadding took 0.058327 seconds.
AES/GCM/NoPadding took 0.058664 seconds.
apangin
  • 92,924
  • 10
  • 193
  • 247
  • Great answer, thanks! Executing `java -XX:+PrintFlagsFinal -version | grep CompileThreshold`, I noticed that in my environment, `CompileThreshold = 10,000` and `Tier3AOTCompileThreshold = Tier4CompileThreshold = 15,000`. The total sum is 10,000 + 15,000 + 15,000 = 40,000. I tried warmup with 40,000 iterations, and it was OK. But anything less than that was either "always too low" or "sometimes too low". Is my line of reasoning correct? – Sadeq Dousti Feb 13 '19 at 04:50
  • 1
    @M.S.Dousti `CompileThreshold` [does not matter](https://stackoverflow.com/a/35614237/3448419). C1 compilation usually starts after a few hundred invocations, C2 - after a few thousand. But the compiler works in background while the main program continues to run in the interpreter. When the compiled code is installed depends on many factors including CodeCache capacity, the number of available CPUs and current system load. There is no guarantee about the number of invocations that will ensure switching to compiled code. – apangin Feb 13 '19 at 16:01