I recently found out that the bottleneck of my code is the following block. N is of order 10,000, and L (10,000)^2. RQ_func is just a function that takes indices (tuples) and returns float V and dictionary sp_dist of {index : probability} format.
Is there a way I can parallelize this code? I have access to cluster computing from which I can use up to 20 cores at a time and would like to use the option.
R = np.empty((L,))
Q = scipy.sparse.lil_matrix((L, N))
traverser = 0 # Populate R and Q by traversing the array
for s_index in state_indices:
for a_index in action_indices:
V, sp_dist = RQ_func(s_index, a_index)
R[traverser] = V
for sp_index, prob in sp_dist.items():
Q[traverser, sp_index] = prob
traverser += 1