2

I receive (a lot of) non-sensitive data from a server via SSL. As the client i'd like to configure the OpenSSL session to be as fast as possible.

I've queried the server to list its accepted SSL ciphers (see below).

I was hoping the server enabled the NULL SSL ciphers, but I don't think it does?

  1. Which of these encryption algorithms would be the fastest on a modern (2015+) Intel x86 processor?
  2. Are there any other OpenSSL settings/flags/modes/compiler switches I can change to improve performance, ignoring security?

Output:

NSOCK ERROR [0.1220s] ssl_init_helper(): OpenSSL legacy provider failed to load.

Host is up (0.017s latency).

Other addresses for ********** (not scanned): **************************

PORT    STATE SERVICE

443/tcp open  https

| ssl-enum-ciphers: 

|   TLSv1.2: 

|     ciphers: 

|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A

|       TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 (ecdh_x25519) - A

|       TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A

|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A

|       TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 (ecdh_x25519) - A

|       TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A

|       TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A

|       TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256-draft (ecdh_x25519) - A

|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A

|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (ecdh_x25519) - A

|       TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A

|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A

|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 (ecdh_x25519) - A

|       TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A

|       TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A

|       TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256-draft (ecdh_x25519) - A

|       TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A

|       TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 2048) - A

|       TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 2048) - A

|       TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A

|       TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 2048) - A

|       TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 2048) - A

|     compressors: 

|       NULL

|     cipher preference: client

|   TLSv1.3: 

|     ciphers: 

|       TLS_AKE_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A

|       TLS_AKE_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A

|       TLS_AKE_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A

|     cipher preference: client

|_  least strength: A



Nmap done: 1 IP address (1 host up) scanned in 2.62 seconds
intrigued_66
  • 16,082
  • 51
  • 118
  • 189

1 Answers1

2

The public key algorithm might influence speed of connection setup, but I think once it's up and running it's only the symmetric cipher and the MAC (message authentication code) that dominate performance. (e.g. AES-128 and SHA1).

openssl speed should tell you about the raw algorithms. For symmetric ciphers, I guess look at verify/sec for whatever key size your server uses for those keys.

AES has HW acceleration in x86 since Nehalem (AES new instructions, aka AES-NI), so you definitely want that as the symmetric cipher. AES128 is somewhat faster than 192 or 256. IDK which mode, CBC or GCM, would be faster.

Out of the choices for hash, without SHA extensions, SHA1 (160-bit) is fastest, followed by SHA-512, followed by SHA256.
(SHA-512 has different internals that allow a more efficient SIMD implementation than SHA256, IIRC not having to shuffle in chunks as small or something like that.)

Are there in x86 any instructions to accelerate SHA (SHA1/2/256/512) encoding? - SHA new-instructions (SHA-NI) still aren't widespread, and a 2015 CPU won't have them:

  • AMD since Zen 1. (550% speedup in OpenSSL speed test).
  • Intel low-power since Goldmont
  • Intel mainstream since Ice Lake (actually Cannon Lake, which existed in a couple laptop models.)

On a CPU with SHA-NI, SHA1 will pull even farther ahead of SHA-512. SHA256 will also be sped up, so it'll be close to or maybe equal with SHA1. But SHA-512 isn't accelerated by SHA-NI.


Test results with openssl speed, with OpenSSL version 1.1.1q on Arch GNU/Linux, using its binary packages.

CPU: i7-6700k Skylake which was released in 2015, Running at 4.2GHz with the rest of the system mostly idle except me typing this answer in Chromium. energy_performance_preference was set to performance on all cores. (Linux 5.19 kernel).

DDR4-2666 DRAM although that shouldn't matter as everything will hit in cache. L1d cache size is 32K, L2 cache size is 256K. So the largest chunk size openssl tested (16K) will completely fill L1d cache, probably getting some misses, for ciphers that copy from one buffer to another. But they'll still get L2 cache hits so it's probably not a bottleneck.

Hashes that just read the data will avoid all cache misses.

My CPU has AVX2, not AVX-512. AVX-512 may help for SHA. Unlikely to help with AES, although it could get something out of VAES extensions to use even less CPU time decrypting AES.

Block ciphers: AES-128 is the fastest block cipher for 8K / 16K blocks, as expected. (RC4 is about twice as fast, but is definitely insecure and not one of the SSL/TLS options.)

# selected results from openssl speed

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes

aes-128 cbc   281173.41k   294162.15k   295066.20k   297662.07k   298541.06k   293197.14k
aes-192 cbc   241961.70k   252595.05k   252103.08k   257893.89k   254962.35k   256365.91k
aes-256 cbc   215042.86k   223668.33k   220263.25k   223360.00k   223918.32k   223477.76k

Hashes: SHA1 (160-bit) is fastest, SHA256 is slowest, as expected if you know that SHA-512 is more amenable to AVX2 or AVX-512 implementation without dedicated support from SHA-NI.

MD5 was about the same speed as SHA512 (148028 in 3.0s), so you wouldn't want it even if it were available. Other non-standard hashes were slower than SHA1.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes

sha1          173375.55k   401452.57k   800641.11k  1037854.38k  1152974.85k  1164574.72k
sha256        101270.31k   218436.46k   397949.95k   491700.57k   527214.11k   525238.27k
sha512         68755.59k   273395.48k   457441.88k   675966.63k   782669.14k   787906.56k

md5           166305.53k   367672.73k   616191.32k   761831.08k   800680.62k   811134.03k
hmac(md5)      65136.17k   198369.19k   459400.36k   662287.36k   798291.29k   802570.24k

So SHA1 runs at about 1.1GB/s, while AES128-CBC runs at a measly 300 MB/s on a single core, only about 3x faster than gigabit ethernet. (memcpy speed is more like 20 GB/s, so these are both much slower than that.)

If your receiver uses multiple connections, it could distribute that CPU load across multiple threads and thus cores.


For crypto benchmarks on more recent CPUs, have a look at Phoronix articles, e.g. Apple M2 vs. AMD Rembrandt (Zen 3) vs. Intel Alder Lake Linux Benchmarks. They're parallelizing their crypto throughput so for example the SHA256 throughput scales with core count.

Again, this answer is assuming you use software that takes advantage of your CPU's instruction-sets like OpenSSL does, to get similar speeds to OpenSSL.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    A chipersuite that uses AEAD (e.g. Chacha20 + Poly1305, GCM modes) should be faster than one that doesn't since there's no separate MAC to compute (CBC is also serial when encrypting). So `TLS_AKE_WITH_AES_128_GCM_SHA256` should be the fastest with AES-NI I think while `TLS_AKE_WITH_CHACHA20_POLY1305_SHA256` should be the fastest without them. As for the KEA, ECC is well known to be faster than RSA. – Margaret Bloom Oct 26 '22 at 07:52
  • @MargaretBloom: I wondered why `openssl speed` didn't test any GCM modes, only CBC and IGE (which were about the same speed). Strange that there's no AES_128_GCM using SHA1, only that and SHA256 or higher, in the OP's list of ciphersuites. – Peter Cordes Oct 26 '22 at 12:56
  • 1
    SHA1 is insecure and pretty much phased out. Many(most?) servers won't accept a SHA1 ciphersuite. – President James K. Polk Oct 27 '22 at 11:43