uWSGI and joblib Semaphore: Joblib will operate in serial mode

Question

I'm running joblib in a Flask application living inside a Docker container together with uWSGI (started with threads enabled) which is started by supervisord.

The startup of the webserver shows the following error:

unable to load configuration from from multiprocessing.semaphore_tracker import main;main(15)
/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_multiprocessing_helpers.py:38: UserWarning:

[Errno 32] Broken pipe.  joblib will operate in serial mode

Any idea how to fix this and make joblib run in parallel? Thanks!

The following packages are installed in the docker container:

pytest==4.0.1
pytest-cov==2.6.0
flake8==3.6.0
Cython==0.29.3
numpy==1.16.1
pandas==0.24.0
scikit-learn==0.20.2
fancyimpute==0.4.2
scikit-garden==0.1.3
category_encoders==1.3.0
boto3==1.9.86
joblib==0.13.1
dash==0.37.0
dash-renderer==0.18.0
dash-core-components==0.43.1
dash-table==3.4.0
dash-html-components==0.13.5
dash-auth==1.3.2
Flask-Caching==1.4.0
plotly==3.6.1
APScheduler==3.5.3

EDIT

The problems are either due to uWSGI, nginx, or supervisord. Missing rights on dev/shm are not the issue as Semaphores can be created if I run the flask server directly. Find below the config files of the three services. Disclaimer, I'm webserver noob, and the configs were born by copying and pasting from different blogs just to make it work :-D

So here's my uwsgi config:

[uwsgi]
module = prism_dash_frontend.__main__
callable = server

uid = nginx
gid = nginx

plugins = python3

socket = /tmp/uwsgi.sock
chown-socket = nginx:nginx
chmod-socket = 664

# set cheaper algorithm to use, if not set default will be used
cheaper-algo = spare

# minimum number of workers to keep at all times
cheaper = 3

# number of workers to spawn at startup
cheaper-initial = 5

# maximum number of workers that can be spawned
workers = 5

# how many workers should be spawned at a time
cheaper-step = 1
processes = 5

die-on-term = true
enable-threads = true

The nginx config:

# based on default config of nginx 1.12.1
# Define the user that will own and run the Nginx server
user nginx;
# Define the number of worker processes; recommended value is the number of
# cores that are being used by your server
# auto will default to number of vcpus/cores
worker_processes auto;

# altering default pid file location
pid /tmp/nginx.pid;

# turn off daemon mode to be watched by supervisord
daemon off;

# Enables the use of JIT for regular expressions to speed-up their processing.
pcre_jit on;

# Define the location on the file system of the error log, plus the minimum
# severity to log messages for
error_log /var/log/nginx/error.log warn;

# events block defines the parameters that affect connection processing.
events {
    # Define the maximum number of simultaneous connections that can be opened by a worker process
    worker_connections  1024;
}


# http block defines the parameters for how NGINX should handle HTTP web traffic
http {
    # Include the file defining the list of file types that are supported by NGINX
    include /etc/nginx/mime.types;
    # Define the default file type that is returned to the user
    default_type text/html;

    # Don't tell nginx version to clients.
    server_tokens off;

    # Specifies the maximum accepted body size of a client request, as
    # indicated by the request header Content-Length. If the stated content
    # length is greater than this size, then the client receives the HTTP
    # error code 413. Set to 0 to disable.
    client_max_body_size 0;

    # Define the format of log messages.
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                        '$status $body_bytes_sent "$http_referer" '
                        '"$http_user_agent" "$http_x_forwarded_for"';

    # Define the location of the log of access attempts to NGINX
    access_log /var/log/nginx/access.log  main;

    # Define the parameters to optimize the delivery of static content
    sendfile       on;
    tcp_nopush     on;
    tcp_nodelay    on;

    # Define the timeout value for keep-alive connections with the client
    keepalive_timeout  65;

    # Define the usage of the gzip compression algorithm to reduce the amount of data to transmit
    #gzip  on;

    # Include additional parameters for virtual host(s)/server(s)
    include /etc/nginx/conf.d/*.conf;
}

The supervisord config:

[supervisord]
nodaemon=true

[program:uwsgi]
command=/usr/bin/uwsgi --ini /etc/uwsgi/uwsgi.ini
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

[program:nginx]
command=/usr/sbin/nginx
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

2nd EDIT

After moving from Python 3.5 to 3.7.2, the nature of the error has slightly changed:

unable to load configuration from from multiprocessing.semaphore_tracker import main;main(15)
/usr/local/lib/python3.7/multiprocessing/semaphore_tracker.py:55: UserWarning:

semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.

unable to load configuration from from multiprocessing.semaphore_tracker import main;main(15)

Help really appreciated, this is currently a big blocker for me :-/

3rd EDIT:

HERE on my github account is a minimum, complete, and verifiable example.

You can run it easily via make build followed by make run.

It will display the following log message:

unable to load configuration from from multiprocessing.semaphore_tracker import main;main(14)

and crash once you visit http://127.0.0.1:8080/ with the following error:

exception calling callback for <Future at 0x7fbc520c7eb8 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 309, in __call__
    self.parallel.dispatch_next()
  File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 731, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 510, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 151, in submit
    fn, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1022, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {EXIT(1), EXIT(1), EXIT(1), EXIT(1)}

That's a good idea, will provide one either later today or tomorrow. — SmCaterpillar, Feb 23 '19 at 09:24
I added an example hosted on github. It's a bit too big to be displayed here in text form. — SmCaterpillar, Feb 23 '19 at 16:30
This is a bug caused by fact that the uwsgi binary embeds the Python interpreter and `sys.executable` points the `uwsgi` binary instead of the usual python binary. Loky could be fixed to detect that situation and try to lookup the python command line instead of naively using sys.executable: https://github.com/tomMoral/loky/issues/207 Feel free to submit a PR :) A similar fix could be contributed to the concurrent.futures module in the standard library of Python. — ogrisel, Jan 10 '20 at 08:30

score 9 · Accepted Answer · answered Feb 27 '19 at 09:05

This was quite a rabbit hole.

The joblib issues page on Github has similar posts of joblib failing with Uwsgi. But most are for the older multiprocessing backend. The new loky backend was supposed to solve these issues.

There was PR for the multiprocessing backend that solved this issue for uwsgi:

joblib.Parallel(n_jobs=4,backend="multiprocessing")(joblib.delayed(sqrt)(i ** 2) for i in range(10))

But it failed sometimes randomly and fell back to the same issue that the PR above tried to solve.

Further digging revealed that the present backend loky parallelizes on processes by default (docs). But these processes dont have shared memory access and so need serialized and queued channels. This is probably the reason why uWSGI fails and gunicorn works.

So I tried switching to threads instead of processes:

joblib.Parallel(n_jobs=4,prefer="threads")(joblib.delayed(sqrt)(i ** 2) for i in range(10))

And it works :)

Many thanks manish, helped my find a workaround for a very similar [issue](https://github.com/joblib/joblib/issues/1002). Is there any way to convince joblib used in a dependency to use `prefer="threads"`? — Alex, Jan 29 '20 at 07:20
This is more than a year after your answer, but the problem still exists in joblib. I too have the same problem of the random failures. For now, I tried using the multiprocessing backend, and it seems to be working. How does using threads help/work? Especially since Python is not inherently multithreaded, but better suited for multiprocessing? — CodingInCircles, Apr 01 '20 at 13:58

score 2 · Answer 2 · answered Feb 23 '19 at 16:37

2

Well, I did find an answer to my problem. It solves the issue in terms of being able to run a joblib dependent library with supervisor and nginx in docker. However, it is not very satisfying. Thus, I won't accept my own answer, but I am posting it here in case other people have the same problem and need to find an okayish fix.

The solution is replacing uWSGI by gunicorn. Well, at least I know now whose fault it is. I would still appreciate an answer that solves the issue using uWSGI instaed of gunicorn.

answered Feb 23 '19 at 16:37

SmCaterpillar

6,683
7
42
70

Can you verify if joblib needs synchronous threads to work? uWSGI create workers on async threads which will create problems with applications needing synchronous nature. I encountered the same problem while deploying my tensorflow models. Gunicorn supports sync workers by default and so is the natural solution. – Manish Dash Feb 25 '19 at 13:20
How can I verify or check if joblib needs synchronous threads? – SmCaterpillar Feb 26 '19 at 14:05
The correct solution would be to change loky to implement: https://github.com/tomMoral/loky/issues/207#issuecomment-485137208 – ogrisel Jan 10 '20 at 08:26

SR_ · Answer 3 · 2019-02-20T19:51:30.757

0

It seems that semaphoring is not enabled on your image: Joblib checks for multiprocessing.Semaphore() and it only root have read/write permission on shared memory in /dev/shm. Have a look to this question and this answer.

This is run in one of my containers.

$ ls -ld /dev/shm
drwxrwxrwt 2 root root 40 Feb 19 15:23 /dev/shm

If you are running as non-root, you should change the permission on /dev/shm. To set the correct permissions, you need to modify the /etc/fstab in you Docker image:

none /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0

edited Feb 20 '19 at 19:51

answered Feb 20 '19 at 19:43

SR_

147
10

Any idea how? If do `RUN echo "none /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0" > /etc/fstab` and `RUN ls -ld /dev/shm` I still get `drwxrwxrwt 2 root root 40 Feb 21 09:21 /dev/shm` – SmCaterpillar Feb 21 '19 at 09:21
1

Could you try to compile and run this [program](https://privatebin.net/?911f678e82fcd100#ukofwegIGEwmRxA4N45EoEcFjC641Z0+aHJ0pUsF3DI=) inside your container? Compile with `gcc -lrt -pthread main.c`. I want to understand if it's a system-wise problem or if it's specific to some libraries. – SR_ Feb 22 '19 at 12:59
It's weird. If you are getting `drwxrwxrwt` permission on `/dev/shm/` you should be able to access it even as non-root. – SR_ Feb 22 '19 at 13:03
Compiling and running your program in the docker container gives `Sample data written Value of mySem=1` – SmCaterpillar Feb 22 '19 at 22:12
It is definitely due to uWSGI, nginx, or supverisord. If I run the flask server in debug mode in the container and joblib works perfectly. I'll add the configs to my main post. – SmCaterpillar Feb 23 '19 at 06:17

uWSGI and joblib Semaphore: Joblib will operate in serial mode

EDIT

2nd EDIT

3rd EDIT:

3 Answers3

Linked