0

The following code is my first try to fail fast when in the hyperparameter function an Exception is thrown.

Unfortunately, the whole data is processed first, before the caller receives the exception.

What can I do that the whole process is terminated immediately, if in the called function an error occurs (so that I can correct my coding bug etc. faster and do not have to wait until all different parameter combinations have be processed / optimized)?

The code:

from sklearn.model_selection import ParameterGrid
from multiprocessing import Pool
from enum import Enum

var1 = 'var1'
var2 = 'var2'
abc = [1, 2]
xyz = list(range(1_00_000))
pg = [{'variant': [var1],
       'abc': abc,
       'xyz': xyz, },
      {'variant': [var2],
       'abc': abc, }]
parameterGrid = ParameterGrid(pg)
myTemp = list(parameterGrid)

print('len(parameterGrid):', len(parameterGrid))


def myFunc(myParam):
    if myParam['abc'] == 1:
        raise ValueError('error thrown')
    print(myParam)


pool = Pool(1)
myList = pool.map(myFunc, parameterGrid)

Which results in:

len(parameterGrid): 200002
{'abc': 2, 'variant': 'var1', 'xyz': 2}
{'abc': 2, 'variant': 'var1', 'xyz': 3}
{'abc': 2, 'variant': 'var1', 'xyz': 4}
{'abc': 2, 'variant': 'var1', 'xyz': 5}
{'abc': 2, 'variant': 'var1', 'xyz': 6}
.
.
.
{'abc': 2, 'variant': 'var1', 'xyz': 99992}
{'abc': 2, 'variant': 'var1', 'xyz': 99993}
{'abc': 2, 'variant': 'var1', 'xyz': 99994}
{'abc': 2, 'variant': 'var1', 'xyz': 99995}
{'abc': 2, 'variant': 'var1', 'xyz': 99996}
{'abc': 2, 'variant': 'var1', 'xyz': 99997}
{'abc': 2, 'variant': 'var1', 'xyz': 99998}
{'abc': 2, 'variant': 'var1', 'xyz': 99999}
ValueError: error thrown
user7468395
  • 1,299
  • 2
  • 10
  • 23
  • You'll need `pool.apply_async()` with error-callback here: [Python multiprocessing: abort map on first child error](https://stackoverflow.com/a/52285247/9059420) – Darkonaut Jul 23 '19 at 16:59

2 Answers2

1

As I can see not whole data is processed. Only for case 'abc' = 2 it passes. As soon as myFunc gets a params with 'abc' = 2, it throws an Exception. Looks right, is not it? You can check all your parameterGrid before running map. It leaves only values that are valid/suitable for you

myTemp_2 = filter(lambda x: x['abc'] != 1, myTemp)

It leaves only values suitable for you

GolovDanil
  • 133
  • 2
  • 11
1

To terminate the whole Pool of processes emergently (hope that you need such condition for test purpose):

...
def myFunc(myParam):
    if myParam['abc'] == 1:
        print('error occurred')
        pool.terminate()    # accessed globally
    print(myParam)

if __name__ == '__main__':
    pool = Pool(1)
    myList = pool.map(myFunc, parameterGrid)

https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.terminate

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • 1
    I'm probably missing something, but *# accessed globally*: how? With start method `spawn` no pool is created in the worker process. With `fork` and `forkserver` there is an incompletely initialized pool that is not assigned to the name *pool* (or any other name) in the workers. – shmee Jul 23 '19 at 14:18
  • @shmee, read about what happens with child processes on condition `if __name__ == '__main__':` *On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource*. But that's more for Unix (actually I'm not considering Windows) – RomanPerekhrest Jul 23 '19 at 14:34
  • @shmee, here's a similar topic https://stackoverflow.com/a/36962624/3185459 – RomanPerekhrest Jul 23 '19 at 14:39
  • 1
    I'm not sure your example applies. The workers are forked during the initialization of the pool. The assignment of the pool object to the name `pool` happens after the pool's `__init__` method completes, when the workers are already alive. The `Event` objects in the first code example of the answer you linked are available in the workers because they were created before the instantiation of the pool. If you move their instantiation to after that of the pool, using them in the function raises a NameError, just as using `pool` does here. – shmee Jul 23 '19 at 15:57
  • 1
    In the second code example of that answer, the function that calls `terminate` on the pool is passed as callback. That executes in the main process, hence the pool is fully initialized and assigned to the respective name in that case. – shmee Jul 23 '19 at 15:59