2

Assume that:

  • 1) I have a folder on Linux that I only have "read" and "execute" permissions.
  • 2) Somebody with higher privilege (with a "write" permission) will put some files into that folder.
  • 3) Multiple processes of a program running in parallel will read and perform some data processing on the files.

Requirement: A file is read by only one process. When done, the file is either removed or moved to a different place. All the files will not be written/edited. For example, I have 5 files and 2 processes running in parallel. I need to ensure that process 1 will read and perform some works on files 1, 3, and 5. Process 2 will read and perform some works on files 2 and 4. It means computational distributions are done at the file-level. We are not going to distribute computations inside files.

On Linux, there are two functions that can place an exclusive lock: flock and fcntl. They work interchangeably and respect the locks placed by the others. The problem is both the functions require the "write" permission.

Boost library also has the boost::interprocess::file_lock::try_lock() function. It does the same job as the above two functions. It also requires the "write" permission.

Are there any other methods that I can place an exclusive lock on files that I don't have "write" permission?

thanhnn
  • 23
  • 4
  • So what exactly are you synchronizing? Is file being modified while it is read? –  Jun 18 '20 at 09:01
  • No, it is only read. But I want to ensure that the file is read by only one process. That is why I need to use an exclusive lock. Using a shared lock still allows other processes to read the same file simultaneously. – thanhnn Jun 18 '20 at 09:18
  • Why? If files are read-only, it should be safe. Are you trying to distribute computation between processes using locks? –  Jun 18 '20 at 09:19
  • Computational distributions are done at the file-level. For example, I have 5 files and 2 processes running in parallel. In this case, process 1 will read and perform some works on files 1, 3, 5. Process 2 will read and perform some works on files 2, 4. – thanhnn Jun 18 '20 at 09:22
  • 1
    Store the locking information somewhere else where both processes can access/write, like a shared database? – simon Jun 18 '20 at 09:29
  • @StaceyGirl: No, the scenario I provided above is totally for example. Picking which files to read and process is totally random. – thanhnn Jun 18 '20 at 09:30
  • @simon, thanks for the advice. That's an approach that we have considered. But if possible, we prefer to use something native which is supported by a well-tested framework, library. – thanhnn Jun 18 '20 at 09:32
  • 1
    @thanhnn So again, what are you synchronizing? Are you using locks to just mark files "busy" e.g. trying to distribute computation this way (as I suggested initially)? –  Jun 18 '20 at 09:38
  • @StaceyGirl, yes exactly. Using the lock here is to mark the file "busy" and it will not be touched by any other processes. – thanhnn Jun 18 '20 at 09:39
  • 1
    If a worker wants to process `file2` it can try to create `/tmp/file2`. If successful, it can go ahead. If not, it was beaten to it by another worker and it can avoid processing that file. – Mark Setchell Jun 18 '20 at 09:42
  • 1
    Or, if you call **GNU Parallel** as `sem`, it can act as a semaphore or mutex across proceeses. https://stackoverflow.com/a/37303133/2836621 – Mark Setchell Jun 18 '20 at 09:46
  • @thanhnn What libc are you using? It doesn't [look](https://elixir.bootlin.com/linux/v5.8-rc1/source/fs/locks.c) like `flock` syscall requires write access to place a lock (any read or write access will suffice). It also works for read-only files on my machine. You might want to try doing `flock` syscall directly. –  Jun 18 '20 at 10:14
  • Have the process that generates the file do a **Redis** `LPUSH` of the filename onto the left end of a list, and have the worker processes do a `BRPOP` off the right end of the list to get the name of a file to process. Only one worker will succeed. **Redis** has bindings in C/C++, Python, bash... – Mark Setchell Jun 18 '20 at 10:15
  • 1
    Or, create a FIFO in that directory and have the *"file creating process"* wtite the filename into the FIFO. The workers can then read from the FIFO to get the name of a file to process. – Mark Setchell Jun 18 '20 at 10:29

1 Answers1

1

Any access (read or write) will suffice for a Linux flock syscall to place a lock on file, unlike fcntl lock which requires read access for a read lock and write for a write lock.

You might be using libc that emulates flock on top of fcntl. To get what you need, invoke system call directly through syscall:

#include <sys/syscall.h>
#include <unistd.h>

// from include/uapi/asm-generic/fcntl.h
#define SYS_LOCK_SH 1
#define SYS_LOCK_EX 2
#define SYS_LOCK_UN 8

static int sys_flock(int fd, int op)
{
    return (int) syscall(SYS_flock, fd, op);
}

As the result, the following program must succeed:

#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>

#define SYS_LOCK_EX 2

static long sys_flock(int fd, int flags)
{
    return (int) syscall(SYS_flock, fd, flags);
}

int main(void)
{
    int fd = open("/etc/hosts", O_RDONLY);
    int ret = sys_flock(fd, SYS_LOCK_EX);

    if (ret) {
        errno = -ret;
        perror("flock");
        return 1;
    }
}
  • Excellent!! The solution works. Thanks a lot. Just to add, if you don't want the 2nd worker to wait, use an additional flag LOCK_NB to not block it. – thanhnn Jun 19 '20 at 03:19