Is 300ms for encrypting and then decrypting a single 8 * 1K message an acceptable RSA performance?

Question

Note: for those of you that cannot do better than coming up with boring, witless comments and even suggestions to close a valid question, please see the accepted answer here: Using GNU/Linux system call `splice` for zero-copy Socket to Socket data transfers in Haskell as an excellent example of how to be of proper help to those that really seek constructive answers!!

Hi I was just reading PowerMod in Mathematica 8's documentation and wanted to test the Haksell RSA package (ghc --make -O2 -O3 -fllvm -optlo-O3 test.hs):

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Monad
import System.Random
import Codec.Crypto.RSA
import Data.ByteString.Lazy
import Data.ByteString.Char8

import Criterion.Main
import Criterion.Config

main :: IO ()
main = do
  print m1
  print m4
  print m8
  defaultMainWith defaultConfig (return ()) [
    bgroup "RSA" [
       bench "1" $ ed m1
     , bench "4" $ ed m4
     , bench "8" $ ed m8
     ]
   ]

m1 = fromChunks [ Data.ByteString.Char8.replicate (1*1024) '0' ]
m4 = fromChunks [ Data.ByteString.Char8.replicate (4*1024) '0' ]
m8 = fromChunks [ Data.ByteString.Char8.replicate (8*1024) '0' ]

ed m = do
  g1 <- newStdGen
  let (el,il,g2) = generateKeyPair g1 1024
  loop 1 g2 el il m

loop :: RandomGen g => Int -> g -> PublicKey -> PrivateKey -> Data.ByteString.Lazy.ByteString -> IO ()
loop n g e i m = do
  let   nn     = n-1
  let  (em,ng) = encrypt g e  m
  let   dm     = decrypt   i em
  when (m == dm) $ Data.ByteString.Char8.putStr "1"
  when (nn > 0 ) $ loop nn ng e i m

Also tried this in Mathematica:

{p, q} = Prime[RandomInteger[{10^4, 10^5}, {2}]];
{p, q, n = p q}
\[Lambda] = CarmichaelLambda[n]
d = NestWhile[#1 + 1 & , Round[n/3], GCD[\[Lambda], #1] =!= 1 &]
e = PowerMod[d, -1, \[Lambda]]
enc = PowerMod[#, e, n] &;
dec = PowerMod[#, d, n] &;
c = ConstantArray[48, 8 1024];
t = Table[c // enc // dec; // AbsoluteTiming, {10}][[All, 1]]

Timings both in Haskell (m8) and Mathematica cases are similar:

{0.313015, 0.302337, 0.303766, 0.303321, 0.303018, 0.302574, \
0.302511, 0.303958, 0.301411, 0.300820}

Is 300ms per 8192-bytes-long message an acceptable performance for RSA? How do OpenSSL or other implementations compare?

(Test rig: 64-bit linux; 4xCORE, Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz)

@OliCharlesworth Is this really your answer!? I am interested in others' experience with RSA implementations. — Cetin Sert, Apr 09 '12 at 15:03
@OliCharlesworth I also DID run it in two languages and put the timings in my question: I really don't get your attitude. — Cetin Sert, Apr 09 '12 at 15:05
It's a comment, not an answer... But in all seriousness, it wouldn't be a bad idea to just download OpenSSL and try it out. — Oliver Charlesworth, Apr 09 '12 at 15:05
Fair enough. But what your question really boils down to is "how fast is OpenSSL in your experience?", which isn't necessarily the best fit for Stack Overflow. Given that OpenSSL is a commonly-used implementation, running that should probably be a good-enough indication of how fast/slow the Mathematica version is compared to the norm. Plus, it would be the fairest possible comparison (running on the same machine!). — Oliver Charlesworth, Apr 09 '12 at 15:12
@OliCharlesworth o____O Ok ... ok I will do so when I have time and update this question as well. — Cetin Sert, Apr 09 '12 at 15:18
If it's of any help, Googling for "OpenSSL benchmark" returns things like this: http://wiki.openwrt.org/inbox/benchmark.openssl, which includes RSA results. I'm not sure what the numbers mean, though. — Oliver Charlesworth, Apr 09 '12 at 15:21
@OliCharlesworth Glad that you have now found something which was perhaps the reason I asked my question in the first place: lots of numbers with different environmental settings and different parameter values for key or message lengths; short = numbers that do not mean anything to me and now you as well :) — Cetin Sert, Apr 09 '12 at 15:25
In your Haskell benchmark, you include the time for generating the key pair. a) is that included in the Mathematica times too, b) is it sensible to include it? — Daniel Fischer, Apr 09 '12 at 15:36
@DanielFischer I am considering using RSA in a network application where the key pairs do not change as long as a connection is running, probably even during the life times of the server / client processes. That is why I focus on the performance of `decrypt ∘ encrypt`. Randomness of key pairs etc. are also of no importance at the moment. — Cetin Sert, Apr 09 '12 at 15:42
I just patched RSA to use CryptoRandomGen, so if you didn't use a version from the past few days then the performance will be (minorly) different. — Thomas M. DuBuisson, Apr 10 '12 at 04:19

score 8 · Accepted Answer · answered Apr 10 '12 at 06:14

First off, good question - the performance difference of RSA to OpenSSL is a question I had too. That said, here's a bunch of text that doesn't give the answer.

The Haskell RSA Package Changed

I've recently moved RSA to using CryptoRandomGen from RandomGen. You are using the painfully slow StdGen so switching to the generator in the intel-aes package or HashDRBG (perhaps a buffered version) from the DRBG package will help.

This is not how you're supposed to use Public Key Cryptography

Generally you use public keys to either exchange a secret key or encrypt a secret key such that only the recipient can decrypt it. You seem to be intending to use RSA to continually encrypt a stream of messages. The performance of RSA is of such little concern to people precisely because it is such a rare operation.

Proper Benchmarking

As Daniel said, you are currently benchmarking key generation, encryption and decryption all in one batch. You responded that you won't be generating many keys, just doing lots of enc/dec operations... so don't you think you should fix the benchmark?

Also, you're benchmark seems incomplete and thus suspect - at the very least it's missing an import.

Other Alarming Things

You say "Randomnes of key pairs [are] of no importance at the moment." Until they are important, there is no reason bothering with cryptography.

Benchmarking Oli also had a good point. Benchmarking OpenSSL is the way to go.

From the command line (which as far as I'm going with the part of the answer) OpenSSL forces you to use RSA semi-correctly, so we'll just be benchmarking encryption of really small files:

dd if=/dev/urandom of=64B bs=64 count=1
openssl genrsa -out test.key 1024
openssl rsa -in test.key -out public.pem -outform PEM -pubout
openssl rsa -in test.key -out private.pem -outform PEM
time openssl rsautl -raw -ssl -encrypt -inkey private.pem -in 64B -out 64B.enc

Which gives us anywhere from 5 to 12 ms.

Now for the Haskell. Aside from cosmetic changes, I've moved to the new RSA using CryptoRandomGen and the not-so-fast but OK HashDRBG generator at the same time as making your encrypt function pure and ditching the unneeded comparison. We end up with:

import Criterion.Main
import Criterion.Config
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Codec.Crypto.RSA
import Crypto.Random.DRBG

main :: IO ()
main = do
  r1 <- newGenIO :: IO HashDRBG
  r2 <- newGenIO :: IO (GenBuffered HashDRBG)

  -- We don't care about the performance of generate, so we do it outside the benchmark framework
  let (pub,priv,g2) = generateKeyPair r2 1024

  defaultMainWith defaultConfig (return ()) [
    bgroup "RSA" [
       bench "1" $ whnf (enc r1 pub priv) m1
       , bench "2" $ whnf (enc r2 pub priv) m1
     ]
   ]

m1 :: L.ByteString
m1 = L.pack [0..63]

enc :: CryptoRandomGen g => g -> PublicKey -> PrivateKey -> L.ByteString -> L.ByteString
enc g pub priv m = 
    let (em,ng) = encrypt g pub m
        dm     = decrypt   priv em 
    in dm

This yields measurements around 3.5ms (compiled with GHC 7.4 and -O2). To be clear: I'm not saying RSA is faster than OpenSSL - the OpenSSL test had a LOT more overhead (loading the executable, reading the key, reading the plaintext, encrypting, writing the result) and it very believably could be an order of magnitude faster than the RSA package. What I am saying is "Hey look, the Haskell RSA code performed arbitrarily fast to the point where I don't really care and you can perfect the benchmark more if you'd like."

For reference, openssl speed rsa1024 says it sign's in 0.5 ms (on my machine, obviously), which I suspect is an RSA encrypt of 16 bytes along with other operations.

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

The short answer is I don't know, I have nothing to compare. I tried to find out how to make openssl perform the same, but haven't found any understandable (by me) documentation. So all I could do was play around with the Haskell RSA code (btw., your code doesn't work with the latest version on hackage, the relevant constraints are now CryptoRandomGen instead of RandomGen, so I fudged an instance CryptoRandomGen Stdgen, certainly suboptimal, but that doesn't make a big difference).

The conclusion of my playing around is that the Haskell RSA implementation is certainly improvable, but probably not obscenely slow. Whether its speed is acceptable depends of course on your needs.

The starting point is your benchmark code with my fudged CryptoRandomGen instance. The reported mean for the 8K string was ~360ms on my computer (64 bit linux, Core i5 M2410, 2.3GHz).

Moving the key generation out of the benchmark brought the time down to ~295ms, so the key generation is rather significant, mostly due to suboptimal prime testing. A better implementation of the strong Fermat test reduces the prime finding time much, and using the Baillie PSW test instead of Miller-Rabin brings a further reduction.

Outside the key generation, the modular exponentiation implementation is lacking, it uses one-bit shifts on the exponent, which is rather slow for large Integers. Improving that brought a noticeable but not dramatic reduction in running time.

Another point where I could improve performance a bit is the conversion between octet streams and Integers, but the difference was minor compared to the above.

With the obvious things done, it remained to find out where the time is spent. It turned out that the vast majority of time is spent doing modular exponentiation. Fortunately it is relatively easy to benchmark the modular exponentiation algorithm in isolation and compare it to GMP's performance. A quick and dirty test suggests that (on my computer at least) GMP's mpz_powm does the job about 2-2.5 times as fast as my Haskell implementation (so if GHC provided a direct binding to mpz_powm, a major speedup could be expected).

All in all, the 8K benchmark runs now in ~270ms with key generation included and ~245ms with key generation excluded, about 200ms thereof are used for modular exponentiation.

Assuming the size limit doesn't allow a dramatically better modular exponentiation than GMP's, I estimate that a good C implementation would be 5-10 times as fast as the current RSA package's¹.

Would that be acceptable? Your call.

¹ But I half expect to be off by a nontrivial factor.

Is 300ms for encrypting and then decrypting a single 8 * 1K message an acceptable RSA performance?

2 Answers2