1

Am using the standard technique of using a RSA key/pair whose public key encrypts a 16 byte random key which encrypts my data using AES/CBC/PKCS5Padding. I am using bouncy castle for my needs I need to encrypt streams of usually large data (512MB+). On running performance tests to understand the overhead of encryption I am seeing that encryption is nearly 30-40% more expensive than un-encrypted data. Is this expected ?

Sample code

public InputStream encryptStream(InputStream streamToEncrypt, byte[] key, byte[] iv, byte[] encryptedKey // 256 bytes) {

        final Cipher cipher = getCipher(Cipher.ENCRYPT_MODE, key, iv);
        byte[] civ = cipher.getIV();
         ...
        ByteArrayInputStream ivEncryptedKeyStream = new ByteArrayInputStream(ivEncryptedKeyArray);
        CipherInputStream encrypted = new CipherInputStream(streamToEncrypt, cipher);

        return new SequenceInputStream(ivEncryptedKeyStream, encrypted);
    }

elsewhere

 InputStream encryptedStream = ...encryptStream(plainStream, key, iv, encKey);
 IOUtils.copyLarge(encryptedStream, outputStream);

I have played around with java server args ; confirmed that the AES-NI instruction set is on etc. Just wanted to have an idea on what overhead should I be expecting with encrypting large streams ?

EDIT : Corrected information that I am using bouncycastle just for the key-pair generation. For AES Encryption using SunJCE as the security provider.

sunny
  • 824
  • 1
  • 14
  • 36
  • 1
    Have you checked if your Java version uses AES-NI? See [AES-NI intrinsics enabled by default?](http://stackoverflow.com/questions/23058309/aes-ni-intrinsics-enabled-by-default) – Robert Jul 23 '16 at 12:24
  • Thanks. I have tried with these arguments and see a very slight increase in performance with the -server arg also added. I will look more into this as well. – sunny Jul 23 '16 at 20:20

1 Answers1

3

The idea of using Bouncy Castle for everything that is already in the Oracle Java API escapes me. AES-NI won't be enabled for Bouncy as Bouncy is a software only library. Java won't magically replace the AESFastEngine with hardware instructions. Just use the Oracle implementation if you want to have speed.

As for the overhead: yes, overhead should be expected. How large the percentage is compare to other calculations depends of course on the machine and the performance of the other calculations. 40% could be a reasonable expectation though.

Notes:

  • the latest Java versions also use CPU instructions for BigInteger operations, so that might also speed up RSA operations;
  • using PKCS#1 padding for Java and/or AES CBC makes your ciphertext vulnerable to padding oracle attacks (in case those are applicable, e.g. in transport protocols);
  • be sure you use a sufficiently large test set, JIT compilation and optimization may kick in relatively late.
Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
  • Thanks for pointing out, I have corrected the info. Am actually using the Oracle Java API for encrypting the streams. I will try out large test sets but I immediately see the difference say between a sample size of 10 vs 10000 files of 3MB each. At my test for 10000 files the performance impact is around 30%. I will next try and benchmark against larger files but in theory would larger files have a greater impact or would it not matter ? – sunny Jul 23 '16 at 20:17
  • 1
    @sunny No, as long as you don't use a different VM then the performance benefit is likely small. 30 GB seems plenty to test already, the code will be optimized by then. There might be some impact wrt the key wrapping and AES key schedule, but that's unlikely to influence your findings overly much. – Maarten Bodewes Jul 23 '16 at 20:22
  • Note that this is one of the few places where I'd use an optimized C or C++ routine instead of Java. Crypto in Java is relatively slow compared to native code, even with intrinsics. Compared to scripting languages it's blazingly fast of course. – Maarten Bodewes Jul 23 '16 at 20:26
  • I am using jdk7. Have tried the -server and AES args as @Robert pointed out but see only a very slight improvement. 30-40% seems like a slightly large value. Clearly I have bit to understand about which algorithm to use AES/CBC/ with or without padding and also use of an iv if i can guarantee that each dataset is using a different key. I was targetting an overhead of 7-15%. – sunny Jul 23 '16 at 20:34
  • As already indicated, overhead is subjective. You cannot target an overhead unless you've got figures on what you are calculating the overhead on. Targeting a specific MB/s for a specific configuration seems more logical. Note: you probably have multiple cores and files. That sounds like a prime target for multithreading (unless your IO still spins). – Maarten Bodewes Jul 23 '16 at 21:36
  • Yes this will be run on a hadoop cluster. With the number of nodes we have we will most likely be able to reach our target of 10TB/day. However still wanted to drill down to a single core execution to understand the intricate details. – sunny Jul 23 '16 at 21:55