0

This should be extremely simple to people who know this, but I am not one of them. I searched a lot about multi-processing, but it makes me even more confused...

I need to process about 160 data files independently. I have some function to deal with data, say f(arg1,arg2). My computer's CPU is i7-3770 (4 cores 8 threads). I was wondering if I can open 8 iptyhon qt consoles to run this same function (by copying the function to each console) with different values for arg1 and arg2 in the same time?

Or is there a very very simple example of do such task by using multiprocessing in python?

I know very little about coding, I am merely using pandas, numpy and scipy to process data. I am using Anaconda as my python environment.

Thank you so much for help!

user3576212
  • 3,255
  • 9
  • 25
  • 33
  • Check the [`multiprocessing`](https://docs.python.org/2/library/multiprocessing.html) module. In particular read about [pool of workers](https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers). You don't have to run multiple qtconsoles to execute code in parallel just use [`Pool.map`](https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map) – Bakuriu Apr 27 '14 at 18:53
  • thanks, i know i don't have to, but can i? because to me, that's the most straight way... – user3576212 Apr 27 '14 at 18:55

2 Answers2

3

The multiprocessing module is meant for this use case.

A simple complete example of its usage is:

import multiprocessing

def my_function(x):
    """The function you want to compute in parallel."""
    x += 1
    return x


if __name__ == '__main__':
    pool = multiprocessing.Pool()
    results = pool.map(my_function, [1,2,3,4,5,6])
    print(results)

The call pool.map will execute my_function with argument 1, then 2 etc but in parallel.

Note that my_function takes only one argument. If you have a function f that takes n arguments simply write a function f_helper:

def f_helper(args):
    f(*args)

And pack the arguments into a tuple. For example:

results = pool.map(f_helper, [(1,2,3), (4,5,6), (7,8,9)])

is equivalent to:

[f(1, 2, 3), f(4, 5, 6), f(7, 8, 9)]

but the calls to f are executed in parallel.


Note: since the code will run in a different process, any side-effect of f wont be preserved. For example if you modify the original argument, the main process will not see this change. You have to think that the arguments are copied and passed to the child process, which computes the result which is again copied into the main process.

If the function you are trying to compute doesn't take long enough the copying of arguments and return value can take more time then running the code serially.


The documentation contains various examples of usage of the module.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • Thank you Bakuriu! I didn't understand the f_helper function: if the original function is f(x,y,z), how do you write f_helper in this case? Sorry I really know little about this. – user3576212 Apr 27 '14 at 19:45
  • @user3576212 The `f_helper` I wrote works with *any* number of arguments. If `args` is a tuple then `f(*args)` will call `f` with the elements of the tuple as arguments. I.e. `t = (1, 2, 3)` then `f(*t) == f(1, 2, 3)`. See also [this](http://stackoverflow.com/questions/2238355/what-is-the-pythonic-way-to-unpack-tuples) question. The `f_helper` effectively "transforms" the `f` function into a function that takes a single argument instead of `n`. – Bakuriu Apr 27 '14 at 20:18
  • Hi, it's me again. I tried out the code you wrote, but I run into some problem. Can I modify the last step into: if __name__ == '__main__': pool = multiprocessing.Pool() pool.map(last_helper, [(1980,1981),(1981,1982)]) ? because my function doesn't return anything. But when I use this code, it's not working, kernel is busy all the time, but doesn't give me anything. – user3576212 Apr 28 '14 at 03:09
  • I got it, it turns out I can't run this in ipython or other interactive interpreter. If I run the whole script, it works. Any idea how to make this work in ipython? – user3576212 Apr 28 '14 at 04:06
  • @user3576212 I can run this fine inside the qtconsole. Note that if you are on windows you *must* put the `if __name__ == '__main__'` guard. See the recommendations about [Windows](https://docs.python.org/2/library/multiprocessing.html#windows). If you don't put that guard you can get an infinite loop. If this doesn't help you you should provide a (possibly stripped down) example of what you are running. – Bakuriu Apr 28 '14 at 05:08
  • I included if __name__ statement. But it doesn't work in ipython notebook. But it works perfectly as a whole script. – user3576212 Apr 29 '14 at 19:41
  • @user3576212 I just tried on my machine and it works fine. Please clarify what do you mean by "doesn't work". Do you get an error? Does ipython crash? It keeps running without providing a result? It returns an incorrect result? – Bakuriu Apr 29 '14 at 19:45
0

I am running the exact same code:

import multiprocessing

def my_function(x):
    """The function you want to compute in parallel."""
    x += 1
    return x


if __name__ == '__main__':
    pool = multiprocessing.Pool()
    results = pool.map(my_function, [1,2,3,4,5,6])
    print(results)

in ipython QT console on Windows. However, just like for the poster above, the code does not work -- the QT console just freezes up.

Any solution to this?

user3438258
  • 237
  • 1
  • 3
  • 9