0

I know there are quite a few questions about speeding up for loops, especially using the multiprocessing tools. I have checked lots of them but I still haven't found anything that helps me solve my problem.

I have a piece of code that I really need to speed up, as I have to run it millions of times.

One of the bottlenecks seems to be a for loop with 8 iterations that I have inside the main for loop. Inside this for loop, which iterates over a list of strings, I have a rather complicated function that calls several other functions that I am already speeding up using numba. The problem is that the calculation I do in each iteration does not depend on previous calculations and can be done independently. Since I do them 8 times, I guess this part of my code could be up to 8x faster than it is at the moment.

I have timed this function using %timeit and it usually takes less than 80ms to run (there's still room to improve this, but that's another story), so the whole for loop takes ~0.65 seconds to run.

I have tried using the multiprocessing module but, quite oddly, seems to make the problem a lot worse, as it takes more than 5 seconds to run this loop.

Simplified version of my code:

import numpy as np

final_results = []

for k in range(5e6):

    # do some calculations

    results = []
    for char in char_list:
        results.append( value_calculation(char, args1, arg2, arg3, arg4) )

    mean_result = np.mean( results ) 

final_results.append( mean_results)

Each iteration of the main loop takes about 1 second, which I don't understand, because both the code before and after the small loop take less than 1ms to run.

As I mentioned, I have tried using the multiprocessing module like this:

from multiprocessing import Pool

def arg_wrapper(args):
    return value_calculation (*args)

p = Pool (processes = 4)
%timeit p.map( arg_wrapper, ( (char, args1, arg2, arg3, arg4) for char in char_list) )

This gives me a time of 5.5 seconds!! Ideally, the whole for loop should take less than 0.1 second.

This is the first my first attempt at parallel coding so, besides the multiprocessing module that I have seen in many other questions, I have no idea what else I can use to run this for loop in parallel and speed up my code.

I should also mention that I normally use a Linux machine (though this should work on Windows as well) and Python 3.6 on Spyder 3.2.8.

Edit: It seems my desktop computer has 4 cores.

Xay
  • 35
  • 4
  • " I am running the code on a desktop computer with only one core". How do you expect a speedup then? – xrisk May 15 '19 at 11:32
  • If you have one single available core, any attempt to use multiprocessing for a computation expensive task can only increase time because the single core will have to do all the job, plus the overhead required by the multiprocessing. – Serge Ballesta May 15 '19 at 11:32
  • As for "Each iteration of the main loop takes about 1 second, which I don't understand, because both the code before and after the small loop take less than 1ms to run." [profile](https://docs.python.org/3/library/profile.html) your code. – xrisk May 15 '19 at 11:34
  • create an `.so` file for calculation, yo will be get `5X` faster than your code(`CDLL`). – dsgdfg May 15 '19 at 12:36
  • Ok, so I THOUGHT I had a single core. I double checked using getconf _NPROCESORS_ONLN and it is telling me I actually have 4 cores (I got the same result trying some other options from here: https://stackoverflow.com/questions/6481005/how-to-obtain-the-number-of-cpus-cores-in-linux-from-the-command-line/18051445#18051445) – Xay May 15 '19 at 12:49
  • @dsgdfg, could you please elaborate? I don't know how to do that. – Xay May 15 '19 at 12:55
  • Please see edits in the last 3 lines of the question. – Xay May 16 '19 at 13:49
  • [Go to URL](https://github.com/numpy/numpy) and click and see an colored bar upper `Download` link, click bar see which language provide calculation (you will be c `C` weight (this C source very clear for starting `write a module` ))? You used `high level language` (hilevel lang got system(os) gap, and all apps is **UNSTABLE**). I've only used the **numpy** once,then I turned to the C language(Learned enough to write my own modules).Computers are not eligible to perform calculations. Processors make calculations very fast. That's why the code you write is so fast that it uses fewer emulator. – dsgdfg May 22 '19 at 12:06

0 Answers0