28

Is there a good way to share a multiprocessing Lock between gunicorn workers? I am trying to write a json API with Flask. Some of the API calls will interact a python class that manages a running process (like ffmpeg for video conversion). When I scale up my number of web workers to more than 1, how can I ensure that only 1 worker is interacting with the class at the same time?

My initial thought was to use multiprocessing.Lock so the start() function can be atomic. I don't think I've figured out the right place to create a Lock so that one is shared across all the workers:

# runserver.py
from flask import Flask
from werkzeug.contrib.fixers import ProxyFix
import dummy

app = Flask(__name__)

@app.route('/')
def hello():
    dummy.start()
    return "ffmpeg started"

app.wsgi_app = ProxyFix(app.wsgi_app)

if __name__ == '__main__':
    app.run()

Here is my dummy operation:

# dummy.py
from multiprocessing import Lock
import time

lock = Lock()

def start():
    lock.acquire()

    # TODO do work
    for i in range(0,10):
        print "did work %s" % i
        time.sleep(1)

    lock.release()

When I refresh the page a few times, I see the output from each call woven together.

Am I barking up the wrong tree here? Is there an easier way to make sure that only copy of the processing class (here just the dummy start() method) gets run at the same time? I think I might need something like celery to run tasks (and just use only 1 worker) but that seems a bit overkill for my small project.

peterw
  • 1,332
  • 2
  • 12
  • 23

3 Answers3

19

I tried something, and it seems to work. I put preload_app = True in my gunicorn.conf and now the lock seems to be shared. I am still looking into exactly what's happening here but for now this is good enough, YMMV.

peterw
  • 1,332
  • 2
  • 12
  • 23
  • Why does this work? When the lock is acquired in a worker process, doesn't it trigger copy-on-write, which means the lock is no longer shared across workers? – kennysong May 27 '21 at 07:03
  • Honestly, I have no idea. I wrote this question 9 years ago, and I've moved on to other technologies since. I think it's safe to assume everything works different now ¯\_(ツ)_/¯ – peterw Jul 05 '22 at 23:34
  • This is still relevant to me in 2023 and I've been looking into it. My best guess is that the check in the multiprocessing lib on whether a lock/semaphore is owned depends on the thread ID of the process checking the lock: https://github.com/python/cpython/blob/a87c46eab3c306b1c5b8a072b7b30ac2c50651c0/Modules/_multiprocessing/semaphore.c#L42 When you preload_app for gunicorn it probably also caches this check with the parent's thread ID, whereas without that config each child does it with their own thread ID. Still just a guess, though. – AndrewKS Apr 19 '23 at 20:28
11

Follow peterw's answer, the workers can share the lock resource.

But, It is better to use try-finally block to ensure the lock will always be released.

# dummy.py
from multiprocessing import Lock
import time

lock = Lock()

def start():
    lock.acquire()

    try:
        # TODO do work
        for i in range(0,10):
            print "did work %s" % i
            time.sleep(1)
    finally:
        lock.release()
Emerson
  • 333
  • 2
  • 8
6

Late addition:
If for some reason, using preload_app is not feasible, then you need to use a named lock. This ensures that all processes are using the same lock object. Using mp.Lock() will create a different object for each process, negating any value.

I saw this package but did not use it yet. It supplies a named lock in the scope of one machine; that means that all processes within the same machine will use the same lock, but outside the boundaries of one machine this solution is not appropriate.

m02ph3u5
  • 3,022
  • 7
  • 38
  • 51
noam cohen
  • 87
  • 1
  • 6
  • 2
    What is a `named lock`? I can not find that in the docs. – cclauss Nov 27 '20 at 14:31
  • I second the recommendation for [NamedAtomicLock](https://pypi.org/project/NamedAtomicLock/). I'm using it to serialize a callback in a Dash app running under gunicorn. On Linux, gunicorn uses `fork` so `mp.Lock()` will not work. – Chris Warth Jan 28 '21 at 19:21
  • Also see this stackoverflow thread with the same recommendation, https://stackoverflow.com/a/64534798/1135316 – Chris Warth Jan 28 '21 at 19:22
  • @cclauss A "named lock" or "named mutex" is the general name for a lock between processes. – kennysong May 27 '21 at 06:47