0

I am trying to add multiprocessing to some code which features functions that I can not modify. I want to submit these functions as jobs to a multiprocessing pool asynchronously. So, given that I cannot modify the functions, how could I specify different timeouts for different functions passed asynchronously to a multiprocessing process pool? Thanks for any suggestions!


EDIT: This question is not requesting code; it is requesting suggestions, general guidance. A minimal understanding of the problem under consideration is demonstrated (note the correct use of the terms "multiprocessing", "pool" and "asynchronously"). Regarding attempted solutions, how can one present attempted solutions if one does not know how to proceed?

Further, as the first response explains, there appears to be no way to accomplish that which is asked. In other words, any attempt at a solution would fail.

Community
  • 1
  • 1
d3pd
  • 7,935
  • 24
  • 76
  • 127
  • 1
    It would really help if you gave us some actual code. And if you responded in any way to the answers on your previous questions so we knew whether you understood or not, and which choice you were going to go with, and so on. As it is, you're asking very vague questions that can only get very vague answers, and I suspect many people aren't bothering to answer at all because they don't even know if you're reading. – abarnert Dec 14 '13 at 01:03
  • The code is still being composed and the purpose of my multiprocessing questions are to direct me in my coding efforts. Thank you very much for [your recent response](http://stackoverflow.com/a/20577608/1556092). As my coding suggests, I have no issue with applying tasks asynchronously; I have no issue with asking multiple questions at once. This question concerns timeouts; [the previous question](http://stackoverflow.com/questions/20577472/how-to-keep-track-of-results-asynchronous-results-returned-from-a-multiprocessin) concerns the management of asynchronous results - two different topics. – d3pd Dec 14 '13 at 01:16
  • I try not to rush my evaluation of suggested courses of action. I want to take the time to implement suggestions so that I can ask rational follow-up questions and evaluate a response fairly and accurately. I thank you very sincerely for your help in this question and the previous one and I ask you to note that any delay in my response arises out of my respectful, unhasty evaluation of your responses. – d3pd Dec 14 '13 at 01:20
  • [pebble](https://pypi.python.org/pypi/Pebble) library allows to set a timeout when you're scheduling a task. If the task takes more than the given timeout to complete, the worker is restarted and the task is discarded. – noxdafox Jul 02 '15 at 13:03

1 Answers1

2

There is really no way to put timeouts on tasks in multiprocessing pools, or to abort them once they've started, in the first place. You can only terminate the entire pool.

Which obviously means there's also no way to put different timeouts on each task.

The only way to really do that is to run each one in a process that you can kill if it oversteps the timeout.

The simplest way to do that is to have a thread in the main process for each child process, so the thread can block on proc.join(timeout), then call proc.terminate() if proc.is_alive() is still true.

The fact that you have to use a Process rather than a Pool or a ProcessPoolExecutor means you have to pass return values back manually, which is a pain. To avoid that, you could use a single-process pool/executor, submit the single job, wait for the AsyncResult/future with a timeout, and terminate the pool/executor if it times out, but that seems a little clumsy for different reasons.

Either way, once you've got threads that can wait on single-process tasks with a timeout, you just toss the threads into a pool/executor of ncpu workers and let it do the work for you.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Thank you very much for your suggestions. For effective handling of timeouts, I think it is likely that individual processes should be spawned as opposed to the use of pools. However, I am experimenting with the ```multiprocessing.pool.ApplyResult``` method ```get()```. Specifically, I am setting as a job data attribute the ```ApplyResult``` object and then using a job-specific timeout when using the ```ApplyResult``` ```get()``` method. – d3pd Dec 14 '13 at 14:08
  • @d3pd: The reason I suggested considering single-process pools/executors was to make it easier to get the results back from the tasks. There are other ways to do that—a queue, a pipe, a shared list with a lock, etc.—but having to deal with all the complexities (including passing exceptions across process boundaries) can be painful. Also, being able to reuse the same process repeatedly (until it times out and you have to kill it) saves you the process startup costs, which can be pretty hefty on Windows if your tasks are short. Then again, as I said, there's also clumsiness the other way. – abarnert Dec 16 '13 at 18:57