4

Can joblib.Memory be used to write in a thread-safe manner to a common cache across multiple processes. In what situations, if any will this fail or cause an error?

Caleb
  • 3,839
  • 7
  • 26
  • 35
  • Moved from question body: Related is https://stackoverflow.com/q/25033631/1361752, which is a question on how to to apply joblib across multiple processes. That was answered. However, the comments under that answer indicated that writing is **mostly** thread-safe. My question here is what is the caveat that results in it only being mostly thread-safe. – Caleb Mar 02 '21 at 16:11

1 Answers1

4

The library first writes to a temporary file and then moves the temporary file to the destination. Source code:

def _concurrency_safe_write(self, to_write, filename, write_func):
    """Writes an object into a file in a concurrency-safe way."""
    temporary_filename = concurrency_safe_write(to_write,
                                                filename, write_func)
    self._move_item(temporary_filename, filename)

Writing to the temporary file seems safe among processes in the same operating system because it includes the pid in the file name. Additionally, it seems safe among threads in the same process because it includes the thread id. Source:

def concurrency_safe_write(object_to_write, filename, write_func):
    """Writes an object into a unique file in a concurrency-safe way."""
    thread_id = id(threading.current_thread())
    temporary_filename = '{}.thread-{}-pid-{}'.format(
        filename, thread_id, os.getpid())
    write_func(object_to_write, temporary_filename)

    return temporary_filename

Moving the temporary file to the destination has shown problems on Windows. Source:

if os.name == 'nt':
    # https://github.com/joblib/joblib/issues/540
    access_denied_errors = (5, 13)
    from os import replace

    def concurrency_safe_rename(src, dst):
        """Renames ``src`` into ``dst`` overwriting ``dst`` if it exists.
        On Windows os.replace can yield permission errors if executed by two
        different processes.
        """
        max_sleep_time = 1
        total_sleep_time = 0
        sleep_time = 0.001
        while total_sleep_time < max_sleep_time:
            try:
                replace(src, dst)
                break
            except Exception as exc:
                if getattr(exc, 'winerror', None) in access_denied_errors:
                    time.sleep(sleep_time)
                    total_sleep_time += sleep_time
                    sleep_time *= 2
                else:
                    raise
        else:
            raise
else:
    from os import replace as concurrency_safe_rename  # noqa

From this source code you can see that on Windows it could fail after having failed to move the temporary file to the destination because of access denied errors during a total time of 1 s and having retried with exponential backoff.

The same source code has a link to the issue #540 that describes the Windows errors and was closed with the comment:

Fixed by #541 (hopefully).

The "(hopefully)" in the comment seems to indicate that the author could not guarantee that the fix was definitive, but the issue has not been reopened, so it probably has not happened again.

For other operating systems there is no special logic or retries and just the standard os.replace() is used. The description mentions cases where it "may fail" and also that it "will be an atomic operation":

Rename the file or directory src to dst. If dst is a directory, OSError will be raised. If dst exists and is a file, it will be replaced silently if the user has permission. The operation may fail if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement).

If no one is changing permissions in the destination directories, you should be less worried about the probability of failure of this operation. The scenario of "if src and dst are on different filesystems" seems not feasible because the source path (temporary file) is built just by adding a suffix to the destination path, so they should be in the same directory.

Other questions that deal with the atomicity of rename:

Hernán Alarcón
  • 3,494
  • 14
  • 16