multiprocessing nested for

Question

I'm currently trying to get a result out of the process showed below. However, it is taking too long for the number of steps needed. I would like to speed up the result. How can I implement multiprocessing for this situation?

Within the class I am building, I have the following definition

    def class_corr(self,Delta,xi_q,Q,counter):
        to = self.t
        npts = 512
        x0 = 12
        dx =2*x0/npts
        norm=0
        classic=0
        for n in range(0,npts+1):
            qi = -x0+n*dx
            for m in range(0,npts+1):
                pj = -x0+m*dx
                for k in range(0,npts+1):
                    xi_pk = -x0+k*dx
                    f1    += dx**3*wt(qi,pj,qo,po,to)*F(qi,xi_pk,Delta, Q)
                    fn += dx**3*wt(qi,pj,qo,po,to)*G(qi,xi_pk,xi_q,Delta,Q)
        if counter:
            return  [f1, fn/f1]
        return  fn/f1

Is it even reasonable to use multiprocessing?

So far, I have checked these:

but I haven't been able to implement those nor gotten a solution.

I am bad at python, but it seems you can run two inner loops in parallel, you will need to synchronize read-and-update operations (+=) for f1 and fn vars. Or use something like AtomicDouble in java. — Yury Nevinitsin, Jan 31 '20 at 18:48
Not really what you're looking for, but why not move dx**3 out of your nested loops. It looks as though you only really need to be calculating it once. Similarly, why 2 calls to wt(qi, pj, q0, p0, t0) inside of the k-loop, instead of just one call to it inside of the m-loop. — user1245262, Jan 31 '20 at 18:49

score 2 · Answer 1 · answered Jan 31 '20 at 19:10

As I think about it, what you really have here is a dynamic programming style problem. You keep recalculating the same terms. For instance, you only need to calculate dx^3 once, yet you do it npts^3 times. Similarly, you only need to calculate each 3*wt(qi,pj,qo,po,to) term once, but you do it 2*npts times.

Try something like:

def class_corr(self,Delta,xi_q,Q,counter):
    to = self.t
    npts = 512
    x0 = 12
    dx =2*x0/npts
    dx3 = dx**3
    norm=0
    classic=0
    for n in range(0,npts+1):
        qi = -x0+n*dx
        for m in range(0,npts+1):
            pj = -x0+m*dx
            wt_curr = wt(qi,pj,qo,po,to)
            for k in range(0,npts+1):
                xi_pk = -x0+k*dx
                f1 += dx3*wt_curr*F(qi,xi_pk,Delta, Q)
                fn += dx3*wt_curr*G(qi,xi_pk,xi_q,Delta,Q)
    if counter:
        return  [f1, fn/f1]
    return  fn/f1

Additionally, you calculate F & G npts more times than you need to. It looks as though each only varies with qi and xi_pk (xi_q, Delta and Q don't seem to vary in this method). If you tried using some sort of 2-layer defaultdict to record the qi and xi_pk values for which you've already calculated F (or G), you would then save lots of unnecessary calls and calculations of F (or G).

(PS - I know this wasn't the approach you're looking for, but I think it addresses the core of your problem. You're spending a lot of time recalculating terms you've already calculated.)

multiprocessing nested for

1 Answers1