1

We've some parallel processing code which is built around Pebble, it's been working robustly for quite some time but we seem to have run into some odd edge-case.

Based on the exception trace (and the rock-simple code feeding it) I suspect that it's actually a bug in Pebble but who knows.

The code feeding the process pool is pretty trivial:

pool = ProcessPool(max_workers=10, max_tasks=10)
for path in filepaths:
  try:
    future = pool.schedule(function=self.analyse_file, args(path), timeout=30)
    future.add_done_callback(self.process_result)
  exception Exception as e:
    print("Exception fired:" + e) # NOT where the exception is firing

pool.close()
pool.join()

So in essence, we schedule a bunch of stuff to run, close out the pool then wait for the pool to complete the scheduled tasks. NOTE: the exception is not being thrown in the schedule loop, it gets fired AFTER we call join().

This is the exception stack trace:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 150, in task_scheduler_loop
    pool_manager.schedule(task)
  File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 198, in schedule
    self.worker_manager.dispatch(task)
  File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/process.py", line 327, in dispatch
    self.pool_channel.send(WorkerTask(task.id, task.payload))
  File "/home/user/.pyenv/versions/scrapeapp/lib/python3.6/site-packages/pebble/pool/channel.py", line 66, in send
    return self.writer.send(obj)
  File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/user/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
RuntimeError: dictionary changed size during iteration

I think it's got to be some weird race condition, as the code will work flawlessly on some data sets but fail at what appears to be a random point on another dataset.

We were using pebble 4.3.1 when we first ran into the issue (same version we'd had since the beginning), tried upgrading to 4.5.0, no change.

Has anybody run into similar issues with Pebble in the past? If so what was your fix?

Paolo
  • 20,112
  • 21
  • 72
  • 113
jrandodev
  • 11
  • 1
  • Should it be `join` then `close`? – tdelaney Feb 24 '20 at 01:39
  • 1
    According to the pebble docs; No more job will be allowed into the Pool, queued jobs will be consumed. To ensure all the jobs are performed **call ProcessPool.join() just after closing the Pool.** The "close" construct in Pebble is a bit different to what you'd generally expect – jrandodev Feb 24 '20 at 02:57
  • Did you ever figure this one out? – rosstex Apr 07 '21 at 05:11
  • I couldn't reproduce this. – aaron Mar 15 '22 at 06:27
  • @rosstex Did you encounter this? If so, can you share a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example)? – aaron Mar 15 '22 at 06:27

0 Answers0