3

I'm attempting to start multiprocessing in python 3.6 (the Anaconda distribution). I've heavily tested my internal function (numerical integration), so I'm confident that it works. What is currently giving me trouble is passing the proper ranges because I get some "none" returns.

import multiprocessing
from multiprocessing import Pool

def chunkwise(t, size=2):
    it = iter(t)
    return zip(*[it]*size)

def sint(tupl):
    print('arg = ',tupl)
    #lower = float(tupl[0])
    #upper = float(tupl[1])
    exit()
    #ans = scipy.integrate.quad(int2,lower,upper) 
    #return ans

n_CPUs = 6 

smin = float(10000)
smax = float(np.inf)
smax_spacing = float(2.5*10**12)
srange = np.linspace(smin,smax_spacing,n_CPUs)

srange = np.append(srange,np.inf)
print('length srange = ',len(srange))
filler=[]

for i in range(len(srange)):
    if i == 0:
        filler.append(float(srange[i]))
    elif srange[i] == srange[-1]:
        filler.append(float(srange[i]))
    else:
        filler.append(float(srange[i]))
        filler.append(float(srange[i]))
srange = np.array(filler)
srange = list(chunkwise(srange))

def main():
    pool = Pool(processes=n_CPUs)
    res1 = pool.map(sint,[(smin,float(smin*2)),  (float(smin*2),float(smin*3))])#srange)
    res = sum(res1)
    pool.close()
    pool.join()
    return res

if __name__ =="__main__":
    result = main()

Some of my debugging process can be see in the code I included here. At the moment, I just want to see the arguments that are being passed to my sint() function. When I print the result, I get the result

arg = (number,bigger number)
None
arg = (number2, bigger number2)
None

Why are these "None"s arising? At present, their presence is causing overflows/NaNs that aren't present in the non-parallelized version of the code. Is there a way to not get the "None"s to show up? I tried checking for the presence of "None" in tupl, lower, and upper, but Python seems to not want to identify these (wouldn't print the message "None detected" that I wrote in).

Any help would be very appreciated! Let me know if more information is needed.

PhysicistAbroad
  • 185
  • 1
  • 8

1 Answers1

3

One issue is that multiprocessing launches a separate process for everything you've wrote, it creates a separate Python instance entirely, so your code is actually running everything you've put in global scope multiple times. Running your code will return

>>> length srange =  7
>>> length srange =  7

multiple times for me. You need to move your other code into either a separate function or just call it inside of def main(). Fixing this still results in nones however, which appears to be due to the fact you don't actually return anything in your mapping function, smin in pool.map. Normally your results would be None objects (and sum cannot sum over none objects either) but there's another problem here. Your processes don't actually close.

This is probably because you call exit, there isn't a return or anything, not even None.

You don't call exit to end a mapping function, please look at multiprocessing to see the examples there. Just use a normal function as your mapper, no need to use a system call.

Even though this is not what you want, this is a simple example to show actual functioning multiprocessing code with your example:

EDIT: I didn't realize most of what you posted was not required, I encourage you to make minimal verifiable examples when you post questions, I've minified and changed what I origional posted to do actual integration, I also encourage you to use proper naming conventions when you ask questions and write your own code, sint and tupl are not exceptable descriptive names. What i've done here is shown you how integration can be carried out properly in parrallel using the same scipy integration utility you provided. You can replace integrated_function with the code for your own function and it should work the same

from multiprocessing import Pool
from scipy import integrate


def integrated_function(x):
    return x ** 2


def integration_process(integration_range):
    print("thread launched, tuple = ", integration_range)
    lower = float(integration_range[0])
    upper = float(integration_range[1])
    y, err = integrate.quad(integrated_function, lower, upper)
    return y


def main():
    # notice how we put this inside this main function
    n_CPUs = 6
    total_integration_range = 60000
    integration_chunks = 6
    integration_step = total_integration_range / integration_chunks
    integration_ranges = [(i * integration_step, (i + 1) * integration_step) for i in range(integration_chunks)]
    pool = Pool(processes=n_CPUs)
    res1 = pool.map(integration_process, integration_ranges)  # srange)
    res = sum(res1)
    print(res)
    pool.close()
    pool.join()
    return res


if __name__ == "__main__":
    result = main()
    # thread launched, tuple = (0, 10000)
    # thread launched, tuple = (10000, 20000)
    # thread launched, tuple = (20000, 30000)
    # thread launched, tuple = (30000, 40000)
    # thread launched, tuple = (40000, 50000)
    # thread launched, tuple = (50000, 60000)
    # 72000000000000.0

If your function is complicated enough and the integration is large enough the overhead of multiprocessing should be low enough for it to be faster, note that printing out with in threads causes slowdown you don't want, so out side of debugging I would encourage you not to print.

EDIT: Since they want to do infinite integration I'll also post my thoughts and addendum to the code on that here, instead of leaving it burred in the comments.

Technically even with infinite integration range, you aren't actually integrating infinitely, the specific numerical methods of approximating integrating infinitely are beyond the scope of this question, however since scipy.ntegrate.quad is a uses Gaussian Quadrature to carry out its integration (hence the name 'quad'), it fixes this, and can take np.inf as a bound. Unfortunately I don't know how to guarantee contiguous performance with this bound, it may take longer to do that bound than all of the rest of the integrations, or it may take much less time, which means dividing the work into equal chunks becomes harder. however you would only need to change the last bound on the integration ranges to also include infinity in the range.

That change looks like this:

integration_ranges = [(i * integration_step, (i + 1) * integration_step) for i in range(integration_chunks)]
# we take the last element of the array, and all but the last element of the tuple, 
# and make a new tuple with np.inf as the last element
integration_ranges[-1] = integration_ranges[-1][:-1] + (np.inf,)

After doing this, your last bound should be bounded by infinity, so your total integration range will actually be 0 -> inf, even if total_integration_range isn't infinity

Community
  • 1
  • 1
Krupip
  • 4,404
  • 2
  • 32
  • 54
  • smin is just a float though, so does that affect it? – PhysicistAbroad May 18 '17 at 16:52
  • I've also updated my function above to define chunkwise – PhysicistAbroad May 18 '17 at 16:54
  • Oh, and thanks for putting the return in! I had taken it out for debugging purposes since I didn't want it calling any further functions until I had this working correctly – PhysicistAbroad May 18 '17 at 16:55
  • I just ran the above code and I get the output twice - is that to be expected? – PhysicistAbroad May 18 '17 at 17:02
  • @PhysicistAbroad what do you mean you get the output twice? What I put as the output is what I get. `arg = ...` is printed twice because the function that prints it gets used in two processes and is called twice (since the size of your tuple array is 2) – Krupip May 18 '17 at 17:04
  • That makes sense - is it possible to get two different answers based on different input tuples? Running the same process on two cores unfortunately doesn't speed up my computation. I rewrote it so it adds tupl[0] and tupl[1] and my output is then [30000.0,50000.0], which is printed twice. Is it possible to get one core to return the 30000.0 and the second to return 50000.0? – PhysicistAbroad May 18 '17 at 17:14
  • What do you mean different answers based on different input tuples? Currently with this code, there isn't any computation actually going on, so unless you show us what you are actually doing we can't help explain the situation. All you do is print out values, or in my modification, print out and also take the first element out of the tuples. What exactly are you trying to do? You are going to have to be more specific. – Krupip May 18 '17 at 17:21
  • I'm trying to break up a numerical integral into chunks to try to reduce my computation time. This can be seen in my original code - what I want to do is pass tuples which are the integration limits, independently do the integrations, and then sum each part to give me a final result for the total integral – PhysicistAbroad May 18 '17 at 17:22
  • @PhysicistAbroad then why don't you show that in your code? Current no such integration is happening in `sint`. You need to put the code for integration in your `sint` function. Note that you can only accept one value for the function when you map, so you will need a function partial to fill the array that you are using across all processes, or simply pass the chunks instead of limits and array. – Krupip May 18 '17 at 17:28
  • The function that's being integrated is a few hundred lines, so I'm not including it here - check out what's commented out in my original sint function, it's all there. – PhysicistAbroad May 18 '17 at 17:35
  • @PhysicistAbroad see my updated answer. If your function is complicated enough and the integration is large enough the overhead of multiprocessing should be low enough for it to be faster, note that printing out with in threads causes slowdown you don't want, so out side of debugging I would encourage you not to print. – Krupip May 18 '17 at 17:58
  • This is great! I'll try to implement it with integration to infinity, but I expect it'll work and then I'll accept the answer! And thanks for the tips, I'm not a programmer by training! – PhysicistAbroad May 18 '17 at 18:13
  • Thanks so much for all the help @snb, is it possible to implement a method like this to an infinite integration range? – PhysicistAbroad May 18 '17 at 18:44
  • 1
    Technically even with infinite integration range, you aren't actually integrating infinitely, the specific numerical methods of integrating infinitely are beyond the scope of this question, however since `scipy.ntegrate.quad` is a uses [Gaussian Quadrature](http://stackoverflow.com/a/19724665/2036035) to carry out its integration, it fixes this, and can take np.inf as a bound. Unfortunately I don't know how to guarantee contiguous performance with this bound, however you would only need to change the last bound on the integration ranges to also include infinity in the range. – Krupip May 18 '17 at 18:51