7

https://pypi.python.org/pypi/lockfile/0.12.2 states:

This package is deprecated. It is highly preferred that instead of using this code base that instead fasteners_ or oslo.concurrency is used instead

However, fasteners is clear that is not thread safe:

Warning: There are no guarantees regarding usage by multiple threads in a single process

And I cannot find an example of using oslo.concurrency.

There's also some suggestion that using flock can resolve this situation, but the flock manual states:

(https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2)

The flock system call applies or removes an advisory lock on the file associated with the file descriptor fd. A lock is applied by specifying an operation argument that is one of LOCK_SH or LOCK_EX with the optional addition of LOCK_NB. To unlock an existing lock operation should be LOCK_UN.

Advisory locks allow cooperating processes to perform consistent operations on files, but do not guarantee consistency (i.e., processes may still access files without using advisory locks possibly resulting in inconsistencies).

So...

Here's a python program that needs the lock and unlock functions on it implemented that will prevent the action from being implemented by more than one thread, in one instance of the process at a time.

(hint: launch with python test.py 1 & python test.py 2 & python test.py 3)

How would I fix this code so that it works correctly?

import sys
import time
import random
import threading

def lock():
  pass  # Something here?

def unlock():
  pass  # Something here?

def action(i):
  lock()
  id = threading.current_thread()
  pid = sys.argv[1]
  print("\n")
  for i in range(5):
    print("--> %s - %s - %s " % (i, id, pid))
  unlock()

class Worker(threading.Thread):
  def run(self):
    for i in range(10):
      action(i)

for _ in range(2):
  Worker().start()

The current, incorrect output looks like this:

--> 0 - <Worker(Thread-2, started 123145310715904)> - 2
--> 3 - <Worker(Thread-1, started 123145306509312)> - 1
--> 0 - <Worker(Thread-2, started 123145310715904)> - 1
--> 1 - <Worker(Thread-2, started 123145310715904)> - 2
--> 2 - <Worker(Thread-2, started 123145310715904)> - 2
--> 1 - <Worker(Thread-2, started 123145310715904)> - 1
--> 4 - <Worker(Thread-1, started 123145306509312)> - 1

and should look more like:

--> 0 - <Worker(Thread-2, started 123145310715904)> - 1
--> 1 - <Worker(Thread-2, started 123145310715904)> - 1
--> 2 - <Worker(Thread-2, started 123145310715904)> - 1
--> 3 - <Worker(Thread-2, started 123145310715904)> - 1
--> 4 - <Worker(Thread-2, started 123145310715904)> - 1
--> 0 - <Worker(Thread-2, started 123145310715904)> - 2
etc.
Nick Chammas
  • 11,843
  • 8
  • 56
  • 115
Doug
  • 32,844
  • 38
  • 166
  • 222

2 Answers2

7

Synchronizing related processes

If you can change your architecture to fork off your processes from the same parent, multiprocessing.Lock() should be enough. For example, this makes the threads run serially:

lock = multiprocessing.Lock()

def thread_proc(lock):
    with lock:
        for i in xrange(0, 10):
            print "IN THREAD", threading.current_thread()
            time.sleep(1)

threads = [threading.Thread(
    target=functools.partial(thread_proc, lock))
    for i in [1, 2]
]
for thread in threads:
    thread.start()

A potential problem might be, that multiprocessing.Lock is slightly underdocumented. I cannot give you a definite reference that multiprocessing.Lock objects are also suitable as thread lock objects.

That said: On Windows, multiprocessing.Lock is implemented using CreateSemaphore(), hence you get a cross-process, threading-safe lock. On Unix systems you get a POSIX semaphore, which has the same properties.

Portability might also be a problem, because not all *NIX systems have the POSIX semaphore (FreeBSD still has a port option to compile Python without POSIX semaphore support).

See also Is there any reason to use threading.Lock over multiprocessing.Lock? and Martijn Pieters comment and answer on Why python multiprocessing manager produce threading locks?

Synchronizing unrelated processes

However, as stated in your question, you have unrelated processes. In that case, you need a named semaphore and Python does not provide those out of the box (although it actually uses named semaphores behind the scenes).

The posix_ipc library exposes those for you. Is also seems to work on all relevant platforms.

Community
  • 1
  • 1
dhke
  • 15,008
  • 2
  • 39
  • 56
  • Unfortunately forking is not possible. The process spawning is done inside the server and not available for modification. This is for a 'user land' script running inside another application. `posix_ipc` seems promising, I'll see if it's usable, but we do have to support OSX in this case, and the platform limitations on that library seem pretty severe. – Doug Mar 03 '16 at 10:29
  • @Doug I cannot test it for lack of an MAC here, but it seems that if you can live with a simple lock without timeout, posix_ipc should still work for you. But in essence, I'd test it. – dhke Mar 03 '16 at 10:56
  • Yep, this seems to work on all the platforms we need. :) – Doug Mar 04 '16 at 00:52
1

I believe you can build a cross-thread, cross-process lock by using SQLite from the Python standard library. It's not elegant, but for a low throughput use case it should work great. And it should work on Linux, macOS, and Windows.

If you don't care about Windows support and/or if you're OK with using external libraries, you're better off using one of the following options:

Anyway, back to SQLite: The basic idea is to piggy-back on the locking SQLite already provides when updating a database, and use that as your application lock. If you need multiple locks, then each lock will need its own SQLite database since SQLite does not allow concurrent write activity against the same database.

The only caveat is that you must confirm that SQLite was built with the THREADSAFE=1 compiler option, which makes it safe to use in a multithreaded environment. The pyenv-provided Python I'm running, for example, was built with THREADSAFE=1. The python3 that gets bundled with macOS 11.6, on the other hand, is built with THREADSAFE=2.

Here's a working solution, using your test script:

import sqlite3
import sys
import threading
from contextlib import contextmanager

INT32_MAX = 2147483647


def confirm_sqlite_threadsafe():
    db = sqlite3.connect(':memory:')
    threadsafe_option = "THREADSAFE"
    with db:
        threadsafe = int(
            db.execute(
            f"""
                select substr(compile_options, {len(threadsafe_option) + 2})
                from pragma_COMPILE_OPTIONS
                where compile_options like '{threadsafe_option}=%'
            """)
            .fetchone()[0]
        )
    # If you just need a cross-process lock (vs. a cross-process _and_ cross-thread
    # lock) you can change this check to `threadsafe in [1, 2]`.
    # See: https://www.sqlite.org/compile.html#threadsafe
    if threadsafe != 1:
        raise RuntimeError(
            "SQLite was not built with the threading mode set to 'Serialized'. "
            "For more information: https://www.sqlite.org/compile.html#threadsafe"
        )


@contextmanager
def lock():
    # The different processes must point to the same database file.
    db = sqlite3.connect("lock.sqlite")
    # Keep waiting if blocked.
    # See: https://sqlite.org/c3ref/busy_timeout.html
    db.execute(f"PRAGMA busy_timeout = {INT32_MAX}")
    with db:
        db.execute("CREATE TABLE IF NOT EXISTS lock(a INT PRIMARY KEY)")
        db.execute("DELETE FROM lock")
        db.execute("INSERT INTO lock VALUES (1)")
        # Yield from inside the transaction to hold a lock on the table.
        yield


def action(i):
    with lock():
        id = threading.current_thread()
        pid = sys.argv[1]
        print("\n")
        for i in range(5):
            print("--> %s - %s - %s " % (i, id, pid))


class Worker(threading.Thread):
    def run(self):
        for i in range(10):
            action(i)


if __name__ == '__main__':
    confirm_sqlite_threadsafe()
    for _ in range(2):
        Worker().start()
Nick Chammas
  • 11,843
  • 8
  • 56
  • 115