I would like to use OpenSSL for handling all our SSL communication (both client and server sides). We would like to use HW acceleration card for offloading the heavy cryptographic calculations.
We noticed that in the OpenSSL 'speed' test, there are direct calls to the cryptographic functions (e.g. RSA_sign/decrypt
, etc.). In order to fully utilize the HW capacity, multiple threads were needed (up to 128 threads) which load the card with requests and make sure the HW card is never idle.
We would like to use the high level OpenSSL API for handling SSL connections (e.g. SSL_connect/read/write/accept
), but this API doesn't expose the point where the actual cryptographic operation is done. For example, when calling SSL_connect
, we are not aware of the point where the RSA operations are done, and we don't know in advance which calls will lead to heavy cryptographic calculations and refer only those to the accelerator.
Questions:
- How can I use the high level API while still fully utilizing the HW accelerator? Should I use multiple threads?
- Is there a 'standard' way of doing this? (implementation example)
- (Answered in UPDATE) Are you familiar with Intel's asynchronous OpenSSL ? It seems that they were trying to solve this exact issue, but we cannot find the actual code or usage examples.
UPDATE
From Accelerating OpenSSL* Using Intel® QuickAssist Technology you can see, that Intel also mentions utilization of multiple threads/processes:
The standard release of OpenSSL is serial in nature, meaning it handles one connection within one context. From the point of view of cryptographic operations, the release is based on a synchronous/ blocking programming model. A major limitation is throughput can be scaled higher only by adding more threads (i.e., processes) to take advantage of core parallelization, but this will also increase context management overhead.
The Intel's OpenSSL branch is finally found here. More info can be found in pdf contained here.
It looks like Intel changed the way OpenSSL ENGINE works - it posts work to driver and immediately returns, while the corresponding result should be polled.
If you use other SSL accelerator, than corresponding OpenSSL ENGINE should be modified too.