4

I have a data pipeline with luigi that works perfectly fine if I put 1 worker to the task. However, if I put > 1 workers, then it dies (unexpectedly with exit code -11) in a stage with 2 dependencies. The code is rather complex, so a minimum example would be difficult to give. The gist of the matter is that I am doing the following things with gensim:

  1. Building a dictionary from some texts.
  2. Building a corpus from said texts and the dictionary (requires (1)).
  3. Training an LDA model from the corpus and dictionary (requires (1) and (2)).

For some reason, step (3) crashes every time I put more than one worker, even if (1) and (2) are already completed...

Any help would be greatly appreciated!

EDIT: Here is an example of the logging info. TrainLDA is task (3). There are still two tasks after that that require TrainLDA. All earlier tasks finished correctly. I substituted TrainLDA's arguments for ... so that the output would be more readable. The additional info are just print statements we put to help us know what is happening.

DEB

UG: Pending tasks: 3
DEBUG: Asking scheduler for work...
INFO: [pid 28851] Worker Worker(salt=514562349, workers=4, host=felipe.local, username=Felipe, pid=28825) running   TrainLDA(...)
INFO: Done
INFO: There are no more tasks to run at this time
INFO: TrainLDA(...) is currently run by worker Worker(salt=514562349, workers=4, host=felipe.local, username=Felipe, pid=28825)
==============================
Corriendo LDA de spanish con nivel de limpieza stopwords
==============================
Número de tópicos: 40
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
INFO: TrainLDA(...) is currently run by worker Worker(salt=514562349, workers=4, host=felipe.local, username=Felipe, pid=28825)
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
INFO: TrainLDA(...) is currently run by worker Worker(salt=514562349, workers=4, host=felipe.local, username=Felipe, pid=28825)
INFO: Worker task TrainLDA(...) died unexpectedly with exit code -11
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
INFO: There are 2 pending tasks possibly being run by other workers
INFO: There are 2 pending tasks unique to this worker
INFO: Worker Worker(salt=514562349, workers=4, host=felipe.local, username=Felipe, pid=28825) was stopped. Shutting down Keep-Alive thread
Felipe Gerard
  • 1,552
  • 13
  • 23
  • Each step is a `luigi.Task`, by the way – Felipe Gerard Oct 06 '15 at 19:48
  • Does task 3 spawn just a single time? Or a bunch of times in parallel? If it just spawns once, then having multiple workers doesn't help anything anyways. – Charlie Haley Oct 07 '15 at 15:21
  • Just once. Actually they all are. The thing is that before 1, several tasks are processed in parallel (they process the text files), hence the need for more workers. – Felipe Gerard Oct 07 '15 at 15:28
  • And the only thing you get in the error message is unexpectedly quit with error code -11? Is there a longer error message? Are you sure it's -11 and not 11? Seems very odd. – Charlie Haley Oct 07 '15 at 15:33
  • I added the traceback and it is -11. On my Mac I even get a window saying that Python quit unexpectedly... – Felipe Gerard Oct 07 '15 at 23:03
  • Here's another post referencing -11. http://stackoverflow.com/a/3630571/4667484 I don't really have any help on that unfortunately. – Charlie Haley Oct 08 '15 at 00:59
  • Wow that's intense... What do I have a bad build of though? I'm a noob in these matters, you see... – Felipe Gerard Oct 08 '15 at 01:11
  • The next step I would take is to comment out parts of the code that errors out with multiple workers running and see what line is causing it. – Charlie Haley Oct 12 '15 at 03:52
  • 2
    Did you made any progress on this? I am experiencing the same problem with Luigi on MacOS, it might be a problem of the latter? – Kaleidophon Mar 16 '17 at 11:02
  • Nope, sorry. I haven't used it in a while now. Good luck! – Felipe Gerard Mar 18 '17 at 19:21

0 Answers0