I want to encrypt a list of 300 numbers using homomorphic (paillier) encryption. This takes roughly 3000ms on my notebook, a lot longer on my raspberry pi. I would like to speed that up, so I tried to use multithreading:
from multiprocessing.dummy import pool as ThreadPool
def test_1_performance_of_encryption(self):
print("Test: Blind encryption performance test")
print("-----------------------------")
print()
# Only blinds are encrypted
for y in range(0, ((NUMBER_OF_RUNS//SENSOR_SAMPLES_PER_BLIND)*NUM_OF_DATA_PROCESSORS)):
print("Round {}:".format(y+1))
print("Encrypting {} blinds...".format(NUM_OF_SENSOR_SAMPLES))
encrypted_blinds.clear()
millis_start = int(round(time.time() * 1000))
for x in range(0, NUM_OF_SENSOR_SAMPLES):
encrypted_blinds.append(public_keys[0].encrypt(blinds[x]))
millis_end = int(round(time.time() * 1000))
time_elapsed_enc.append(millis_end - millis_start)
print("Time elapsed: {}ms".format(time_elapsed_enc[y]))
print("Test finished. Time elapsed:")
print("Min: {} | Max: {} | Avg: {}".format(min(time_elapsed_enc), max(time_elapsed_enc),
(sum(time_elapsed_enc)/len(time_elapsed_enc))))
print()
@profile
def test_1a_performance_of_encryption_multithreaded(self):
print("Test: Blind encryption performance test with {} threads".format(NUM_OF_THREADS))
for y in range(0, ((NUMBER_OF_RUNS//SENSOR_SAMPLES_PER_BLIND)*NUM_OF_DATA_PROCESSORS)):
print("Round {}:".format(y+1))
print("Encrypting {} blinds...".format(len(blinds)))
millis_start = int(round(time.time() * 1000))
encrypted_blinds_multithreaded = pool.map(public_keys[0].encrypt, blinds)
millis_end = int(round(time.time() * 1000))
time_elapsed_enc_multithread.append(millis_end - millis_start)
print("Time elapsed: {}ms".format(time_elapsed_enc_multithread[y]))
print("Test finished. Time elapsed:")
print("Min: {} | Max: {} | Avg: {}".format(min(time_elapsed_enc_multithread), max(time_elapsed_enc_multithread),
(sum(time_elapsed_enc_multithread) / len(time_elapsed_enc_multithread))))
print()
However, both tests finish in more or less exactly the same amount of time. While the single thread method uses one core 100%, the multithreaded version uses all of them, but still creates exactly a load of 1 (equal to 1 core at 100%). Am I doing anything wrong here? I have read this question and it's answers: Python multiprocessing.Pool() doesn't use 100% of each CPU, however I don't believe the reason here is interprocess communication, as it would be very weird that it settles exactly at a load of 1...
EDIT: I was using multiprocessing.dummy
instead of multiprocessing
. I think this used multiple threads instead of processes, and multiple threads can not be executed in parallel in Python due to a thing called GIL (global interpreter lock). I fixed it by changing multiprocessing.dummy
to multiprocessing
. I now have n processes and 100% CPU usage.