If your operation is taking a long time (say maybe 30 seconds or longer), then you could perhaps benefit from dividing results
into as many pieces as you want to run python processes, and using python's multiprocessing
module. If the operation isn't taking that long, the overhead of starting these new processes will outweigh the benefit of using them.
Since the operation being carried out does not depend on the values stored in v
, each process can write to an independent vector and you can aggregate the results at the end. Pass each process a vector v_prime
of 0's of the same length as v
. Perform the above operation, each process handling a portion of the output_diff
s in results
and incrementing the corresponding values in v_prime
instead of v
. Then at the end, each process returns its vector v_prime
. Sum all of the returned v_prime
s and the original v
(this is where having the items expressed as numpy arrays is helpful, as it is easy to add numpy vectors of the same length) to get the correct result.