4

I have a question quite similar to this question, where I need the follow conditions to be upheld:

  • If a file is opened for reading, that file may only be opened for reading by any other process/program
  • If a file is opened for writing, that file may only be opened for reading by any other process/program

The solution posted in the linked question uses a third party library which adds an arbitrary .LOCK file in the same directory as the file in question. It is a solution that only works wrt to the program in which that library is being used and doesn't prevent any other process/program from using the file as they may not be implemented to check for a .LOCK association.

In essence, I wish to replicate this result using only Python's standard library.

BLUF: Need a standard library implementation specific to Windows for exclusive file locking

To give an example of the problem set, assume there is:

  • 1 file on a shared network/drive
  • 2 users on separate processes/programs

Suppose that User 1 is running Program A on the file and at some point the following is executed:

with open(fp, 'rb') as f:
    while True:
        chunk = f.read(10)
        if chunk:
            # do something with chunk
        else:
            break 

Thus they are iterating through the file 10 bytes at a time.

Now User 2 runs Program B on the same file a moment later:

with open(fp, 'wb') as f:
    for b in data:  # some byte array
        f.write(b)

On Windows, the file in question is immediately truncated and Program A stops iterating (even if it wasn't done) and Program B begins to write to the file. Therefore I need a way to ensure that the file may not be opened in a different mode that would alter its content if previously opened.

I was looking at the msvcrt library, namely the msvcrt.locking() interface. What I have been successful at doing is ensuring that a file opened for reading can be locked for reading, but nobody else can read the file (as I lock the entire file):

>>> f1 = open(fp, 'rb')
>>> f2 = open(fp, 'rb')
>>> msvcrt.locking(f1.fileno(), msvcrt.LK_LOCK, os.stat(fp).st_size)
>>> next(f1)
b"\x00\x05'\n"
>>> next(f2)
PermissionError: [Errno 13] Permission denied

This is an acceptible result, just not the most desired.

In the same scenario, User 1 runs Program A which includes:

with open(fp, 'rb') as f
    msvcrt.locking(f.fileno(), msvcrt.LK_LOCK, os.stat(fp).st_size)
    # repeat while block
    msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, os.stat(fp).st_size)

Then User 2 runs Program B a moment later, the same result occurs and the file is truncated.

At this point, I would've liked a way to throw an error to User 2 stating the file is opened for reading somewhere else and cannot be written at this time. But if User 3 came along and opened the file for reading, then there would be no problem.

Update:

A potential solution is to change the permissions of a file (with exception catching if the file is already in use):

>>> os.chmod(fp, stat.S_IRUSR | stat.S_IRGRP | stat.S_IROTH)
>>> with open(fp, 'wb') as f:
        # do something
PermissionError: [Errno 13] Permission denied <fp>

This doesn't feel like the best solution (particularly if the users didn't have the permission to even change permissions). Still looking for a proper locking solution but msvcrt doesn't prevent truncating and writing if the file is locked for reading. There still doesn't appear to be a way to generate an exclusive lock with Python's standard library.

pstatix
  • 3,611
  • 4
  • 18
  • 40
  • If it's just Windows, you can call `CreateFile` (e.g. PyWin32's `win32file.CreateFile`) and set the sharing mode to the desired read/execute, write/append, and delete/rename sharing. Wrap the file handle it returns with a file descriptor via `msvcrt.open_osfhandle`. Then open the file descriptor via `open`. – Eryk Sun Mar 17 '20 at 04:09
  • @ErykSun But that is not a Python standard library implementation is it? It requires PyWin32. – pstatix Mar 17 '20 at 13:18
  • 1
    The standard library has ctypes. It's a bit more work to implement it with ctypes, assuming you set the function prototypes and properly handle errors and exceptions to make it idiomatic. – Eryk Sun Mar 17 '20 at 16:33
  • @ErykSun Yep, this is the way I am currently going. – pstatix Mar 17 '20 at 16:35
  • @ErykSun While it works as intended (I will post a solution), oddly enough `CreateFileW` doesn't throw a `FileNotFoundError`. If the path doesn't exist, it returns `-1` instead and then `msvcrt.open_osfhandle` returns an `OSError: Bad file descriptor`. Per the MSDN docs I would've thought the former error would've been raised. – pstatix Mar 17 '20 at 18:13

1 Answers1

1

For those who are interested in a Windows specific solution:

import os
import ctypes
import msvcrt
import pathlib

# Windows constants for file operations
NULL = 0x00000000
CREATE_ALWAYS = 0x00000002
OPEN_EXISTING = 0x00000003
FILE_SHARE_READ = 0x00000001
FILE_ATTRIBUTE_READONLY = 0x00000001  # strictly for file reading
FILE_ATTRIBUTE_NORMAL = 0x00000080  # strictly for file writing
FILE_FLAG_SEQUENTIAL_SCAN = 0x08000000
GENERIC_READ = 0x80000000
GENERIC_WRITE = 0x40000000

_ACCESS_MASK = os.O_RDONLY | os.O_WRONLY
_ACCESS_MAP = {os.O_RDONLY: GENERIC_READ,
               os.O_WRONLY: GENERIC_WRITE
               }

_CREATE_MASK = os.O_CREAT | os.O_TRUNC
_CREATE_MAP = {NULL: OPEN_EXISTING,
               os.O_CREAT | os.O_TRUNC: CREATE_ALWAYS
               }

win32 = ctypes.WinDLL('kernel32.dll', use_last_error=True)
win32.CreateFileW.restype = ctypes.c_void_p
INVALID_FILE_HANDLE = ctypes.c_void_p(-1).value


def _opener(path: pathlib.Path, flags: int) -> int:

    access_flags = _ACCESS_MAP[flags & _ACCESS_MASK]
    create_flags = _CREATE_MAP[flags & _CREATE_MASK]

    if flags & os.O_WRONLY:
        share_flags = NULL
        attr_flags = FILE_ATTRIBUTE_NORMAL
    else:
        share_flags = FILE_SHARE_READ
        attr_flags = FILE_ATTRIBUTE_READONLY

    attr_flags |= FILE_FLAG_SEQUENTIAL_SCAN

    h = win32.CreateFileW(path, access_flags, share_flags, NULL, create_flags, attr_flags, NULL)

    if h == INVALID_FILE_HANDLE:
        raise ctypes.WinError(ctypes.get_last_error())

    return msvcrt.open_osfhandle(h, flags)


class _FileControlAccessor(pathlib._NormalAccessor):

    open = staticmethod(_opener)


_control_accessor = _FileControlAccessor()


class Path(pathlib.WindowsPath):

    def _init(self) -> None:

        self._closed = False
        self._accessor = _control_accessor

    def _opener(self, name, flags) -> int:

        return self._accessor.open(name, flags)
pstatix
  • 3,611
  • 4
  • 18
  • 40
  • Use `kernel32 = WinDLL('kernel32.dll', use_last_error=True)`. Set the result type to a pointer (handle): `kernel32.CreateFileW.restype = ctypes.c_void_p`. Define `INVALID_HANDLE_VALUE = ctypes.c_void_p(-1).value`. If it returns the latter, then `raise ctypes.WinError(ctypes.get_last_error())`. – Eryk Sun Mar 17 '20 at 23:45
  • This is *not* a lock file. The share mode is per-open, not per-process, so if you don't share write access, the file cannot be reopened with write access, not even by your own process. Your code should directly use this handle, wrapped in an fd via `msvcrt.open_osfhandle`. You can subsequently open the fd as a file object via builtin `open`. – Eryk Sun Mar 17 '20 at 23:49
  • Your function that opens a handle, wraps it in an fd, and returns a file object should do the last two steps in nested `try`-`finally` blocks. If `open_osfhandle` fails, the `finally` handler should call `kernel32.CloseHandle(handle)` to avoid leaking a handle. If `open` fails, the `finally` handler should call `os.close(fd)`. After the handle is owned by the file object, do *not* call `CloseHandle` on the handle or `os.close` on the fd. Python's I/O stack owns it now. – Eryk Sun Mar 17 '20 at 23:56
  • @ErykSun Perhaps you could post a solution showing what you mean? I use `pathlib.Path` and later in the interface of my classes `pathlib.Path.open()`. If I create a handle with no sharing, I cannot even open the file in the same process (because `pathlib.Path.open()` uses the same mechanics at `open()` which tries to create a new handle for the file. – pstatix Mar 18 '20 at 18:25
  • @ErykSun When I open the handle, I cannot map that handle through `msvcrt` to get it back into proper interface for `pathlib.Path`, which the interface is heavily built around. If there was a way to take a handle or file descriptor and wrap it in the `pathlib.Path` interface, that would be great. – pstatix Mar 18 '20 at 18:30
  • @ErykSun However, because `msvcrt.open_osfhandle` returns a file descriptor from the handle, I am forced to then use `os.fdopen()` when I need to use `pathlib.Path.open()`. – pstatix Mar 18 '20 at 18:31
  • pathlib uses the accessor `_normal_accessor`, which is an instance of `pathlib._NormalAccessor`. The accessor defines the `open` opener function as `os.open`. You can pass a custom accessor using the `template` argument, e.g. `pathlib.Path(filename, template=shared_read)`, where `shared_read` would have an `_accessor` attribute that's an instance of `_NormalAccessor`, but with `open` replaced by a custom opener that maps `flags` to the desired access and calls `CreateFileW` with read sharing and returns an fd from `msvcrt.open_osfhandle`. – Eryk Sun Mar 18 '20 at 23:13
  • Most of the work is in writing the opener, but that's useful in general as an argument for builtin `open`. – Eryk Sun Mar 18 '20 at 23:15
  • @ErykSun I cannot find this documentation anywhere, or other questions about it. Would `shared_read` be a subclass? Again I see no documentation for the `template` keyword. – pstatix Mar 19 '20 at 14:35
  • @ErykSun I was able to figure out the `template` portions by reviewing source. Could you expand on your first comment with respect to the error? You said "if it returns the latter", but I don't follow. Do you mean if it returns `INVALID_HANDLE_VALUE`? Would you say `if handle == INVALID_HANDLE_VALUE:`? – pstatix Mar 19 '20 at 18:06
  • Yes, the "latter" refers to the last value that was just defined in the previous sentence. – Eryk Sun Mar 19 '20 at 21:29
  • Most functions in Windows that return a handle, which is a typedef for a `void` pointer, will return a `NULL` pointer when they fail, but there are a few such as `CreateFileW` that return -1 cast as a pointer when they fail. To set it up right, the `restype` of the function should be set to `ctypes.c_void_p`, and the reserved value should be defined as `INVALID_HANDLE_VALUE = ctypes.c_void_p(-1).value`. In a 64-bit system this value is 18446744073709551615 (i.e. 0xffff_ffff_ffff_ffff). – Eryk Sun Mar 19 '20 at 21:34
  • If `handle == INVALID_HANDLE_VALUE`, meaning that `CreateFileW` failed, then `raise ctypes.WinError(ctypes.get_last_error())` to get an idiomatic `OSError` exception for the Windows error that caused the call to fail. Note that WINAPI `GetLastError` is not called directly in Python. In an interpreted language, the current last error value isn't necessarily the error from the failed call. Instead the DLL is loaded as `ctypes.WinDLL('kernel32', use_last_error=True)`, which has ctypes capture the last error value before the call returns, which is accessible as `ctypes.get_last_error()`. – Eryk Sun Mar 19 '20 at 21:41
  • @ErykSun Awesome, I ended up figuring it out but I wasn't sure why we were casting `c_void_p(-1)` The docs say the fail value is `INVALID_HANDLE_VALUE` but it didn't define what the value of that constant was. Also, the `ctypes.WinError()` actually returns `FileNotFoundError` and not an `OSError`. – pstatix Mar 19 '20 at 22:30
  • @ErykSun Further, why does `ctypes.c_void_p(-1)` end up with a value of 18446744073709551615? – pstatix Mar 19 '20 at 22:34
  • `FileNotFoundError` is a subclass of `OSError`, which is substituted when the C `errno` value is `ENOENT`. In POSIX systems, `ENOENT` is the direct OS error code. In Windows, the C runtime maps several Windows error codes to POSIX `ENOENT`, including `ERROR_FILE_NOT_FOUND`, `ERROR_PATH_NOT_FOUND`, `ERROR_BAD_NET_NAME`, `ERROR_BAD_NETPATH`, and `ERROR_FILENAME_EXCED_RANGE` (i.e. the file path is too long). – Eryk Sun Mar 19 '20 at 22:40
  • 1
    `c_void_p` is initialized from an unsigned integral pointer value, which is typically a memory address or an opaque handle. As a 64-bit signed integer, -1 is natively (i.e. at the bare metal level in the CPU) represented as `0xFFFF_FFFF_FFFF_FFFF`, where each hexadecimal digit is 4 bits and `0xF` is `0b1111`. As an unsigned value, this is 18446744073709551615. To understand why -1 is stored this way as a signed 64-bit integer, read about [two's complement](https://en.wikipedia.org/wiki/Two's_complement). – Eryk Sun Mar 19 '20 at 22:52
  • 1
    Note that Python's `int` type itself doesn't store the value as two's complement, as its a variable-sized "big int" implementation. However, its bitwise operations do preserve what one would expect with two's complement, e.g. `-1 & (2**64 - 1) == 18446744073709551615`. – Eryk Sun Mar 19 '20 at 22:53
  • @ErykSun On the topic of `pathlib.Path(filename, template=shared_read)`, turns out you cannot just add the keyword. Source doesn't pass the keyword to `self._init()` where `template` is set to `None`. Do you see something else indicating otherwise? – pstatix Mar 20 '20 at 16:19
  • @ErykSun Best I could do was define an accessor and replace the attribute of the Path class like `p = Path(filepath); p._accessor = shared_read`. However, I'd like to use the template keyword as its idiomatic and should (in practice) work. – pstatix Mar 20 '20 at 16:36
  • Sorry, I misread the source. Honestly, I don't share your appreciation of pathlib, so I never use it. I disagree with integrating everything into a hierarchical class framework, especially one that's as overly complicated and difficult to extend as pathlib. I'd prefer a pure path type with a much simpler implementation -- no flavours, accessors, or selectors. I'd extend the existing high-level filesystem functionality in `os`, `os.path` and `shutil` in a top-level "fs" module with a simple, function-based API that preserves input path type instead of returning strings, where applicable. – Eryk Sun Mar 20 '20 at 21:51
  • I think the closest to clean you can get is to subclass `WindowsPath` as something like `SharedReadPath`, and override `_init`. In the new `_init`, first call `super()._init(template=template)` and then set `self._accessor = shared_read`. – Eryk Sun Mar 20 '20 at 22:11
  • @ErykSun I've updated my post. I didn't call `super()._init(template=template)`, I just overrode the definition entirely because the implementation in the program will never use the default `_accessor`. Additionally, I needed to override `_opener` and set the `open` of the `_NormalAcessor` subclass to be a staticmethod. Since `os.open` doesn't define `__get__`, additional arguments were being passed as it was treated a member function. Do you see any need to still call `super()._init(template=template)`? – pstatix Mar 23 '20 at 22:04