72

I'm building a system where multiple slave processes are communicating via unix domain sockets, and they are writing to the same file at the same time. I have never studied filesystems or this specific filesystem (ext4), but it feels like there might be some danger here.

Each process writes to a disjoint subset of the output file (ie, there is no overlap in the blocks being written). For example, P1 writes to only the first 50% of the file and P2 writes only to the second 50%. Or perhaps P1 writes only the odd-numbered blocks while P2 writes the even-numbered blocks.

Is it safe to have P1 and P2 (running simultaneously on separate threads) writing to the same file without using any locking? In other words, does the filesystem impose some kind of locking implicitly?

Note: I'm unfortunately not at liberty to output multiple files and join them later.

Note: My reading since posting this question does not agree with the only posted answer below. Everything I've read suggests that what I want to do is fine, whereas the respondent below insists what I am doing is unsafe, but I am unable to discern the described danger.

RayLuo
  • 17,257
  • 6
  • 88
  • 73
Fixee
  • 1,581
  • 2
  • 15
  • 25
  • the answer below suggests that it might work with fixed block sizes and that you should try it out.. did you? –  Oct 25 '11 at 17:45
  • 1
    @Rich: I did try it and it worked fine (downloaded 8GiB file, md5sum checked fine). But this is one test, one file, one operating system. In any case, Tilo's answer says in bold that it is unsafe to do what I'm asking. – Fixee Oct 25 '11 at 22:23

2 Answers2

40

no, generally it is not safe to do this!

you need to obtain an exclusive write lock for each process -- that implies that all the other processes will have to wait while one process is writing to the file.. the more I/O intensive processes you have, the longer the wait time.

it is better to have one output file per process and format those files with a timestamp and process identifier in the beginning of the line, so that you can later merge and sort those output files offline.

Tip: check the file format of web-server log files -- these are done with the time-stamp at the beginning of the line, so they can be later combined and sorted.


EDIT

UNIX processes use a certain / fixed buffer size when they open files (e.g. 4096 bytes), to transfer data to and from the file on disk. Once the write-buffer is full, the process flushes it to disk -- that means: it writes the complete full buffer to disk! Please Note here that it is happening when the buffer is full! -- not when there is an end-of-line! That means even for a single process which writes line-oriented text data to file, that those lines are typically cut somewhere in the middle at the time the buffer is flushed. Only at the end, when the file is closed after writing, can you assume that the file contains complete lines!

So depending on when your process decide to flush their buffers, they write at different times to the file -- e.g. the order is not deterministic / predictable When a buffer is flushed to file, you can not assume that it will only write complete lines -- e.g. it will usually write partial lines, thereby messing up the output if several processes flush their buffers without synchronization.

Check this article on Wikipedia: http://en.wikipedia.org/wiki/File_locking#File_locking_in_UNIX

Quote:

The Unix operating systems (including Linux and Apple's Mac OS X, sometimes called Darwin) do not normally automatically lock open files or running programs. Several kinds of file-locking mechanisms are available in different flavors of Unix, and many operating systems support more than one kind for compatibility. The two most common mechanisms are fcntl(2) and flock(2). A third such mechanism is lockf(3), which may be separate or may be implemented using either of the first two primitives.

You should use either flock, or Mutexes to synchronize the processes and make sure only one of them can write to the file at a time.

As I mentioned earlier, it is probably faster, easier and more straight-forward to have one output file for each process, and then later combine those files if needed (offline). This approach is used by some web-servers for example, which need to log to multiple files from multiple threads -- and need to make sure that the different threads are all high-performing (e.g. not having to wait for each other on a file lock).


Here's a related post: (Check Mark Byer's answer! the accepted answer is not correct/relevant.)

Is it safe to pipe the output of several parallel processes to one file using >>?


EDIT 2:

in the comment you said that you want to write fixed-size binary data blocks from the different processes to the same file.

Only in the case that your block size is exactly the size of the system's file-buffer size, could this work!

Make sure that your fixed block-length is exactly the system's file-buffer size. Otherwise you will get into the same situation as with the not completed lines. e.g. if you use 16k blocks, and the system uses 4k blocks, then in general you will see 4k blocks in the file in seemingly random order -- there is no guarantee that you will always see 4 blocks in a row from the same process

Tilo
  • 33,354
  • 5
  • 79
  • 106
  • Could you please explain why this is unsafe? Can the filesystem become corrupt? – Fixee Oct 20 '11 at 22:42
  • 3
    Wait, `fcntl` is a) advisory, and b) allows region locking, which seems like unnecessary overhead if the original poster knows that the two threads are writing to disjoint regions at all times. (This assumes that the file was preallocated to the known final size.) – Edward Thomson Oct 21 '11 at 00:47
  • 2
    @Tilo: From the question, I understood that the OP wants the separate processes to access a completely different set of blocks in the file (somehow). It seems to me like the concerns you mention are mostly valid for interleaved sequential IO on a file. – millimoose Oct 21 '11 at 00:53
  • 3
    @Fixee: I strongly doubt it's possible to corrupt the filesystem from outside the filesystem driver. The "unsafe" meant that it's possible to get unpredictable incorrect behaviour, as is the case with any race condition. – millimoose Oct 21 '11 at 00:56
  • @Inerdia : well, if he really wants to write BINARY data with a fixed block length, he might get away with it.. but not recommended. Definitely not recommended for text data. He will not corrupt the filesystem, but he will garble / corrupt his output file. – Tilo Oct 21 '11 at 00:57
  • @Tilo: I am writing binary data of fixed block-length (16KiB); there is **ZERO** overlap on writes between processes. I want to avoid locks if I can. – Fixee Oct 21 '11 at 01:03
  • 1
    @EdwardThomson: I am preallocating file to known final size, but using a sparse file. This means the filesystem is allocating space as the file is written; I was worried that these dynamic allocations would cause corrupt inodes/superblocks/etc in the filesystem, but it seems unlikely since I'm SURE ext4 must be thread-safe even in this case?!?! – Fixee Oct 21 '11 at 01:06
  • 4
    @Fixee: then my best advise is to make a little toy program and try it out. **Make sure that your fixed block-length is exactly the system's file-buffer size.** Otherwise you will get into the same situation as with the not completed lines. e.g. if you use 16k blocks, and the system uses 4k blocks, then in general you will see 4k blocks in the file in random order -- e.g. not 4 in a row from the same process... – Tilo Oct 21 '11 at 01:08
  • 2
    I meant: there is no guarantee that you will always see 4 blocks in a row from the same process – Tilo Oct 21 '11 at 01:14
  • @EdwardThomson: yes, I see your point, but what about process A writing to it's section of the file, and getting interrupted by process B? – Tilo Oct 21 '11 at 01:30
  • 1
    @Tilo: I don't think writing a toy program will adequately test this since threading problems are notorious hard to test. And I'm sorry but I'm still confused about what you're warning about regarding mis-matched block-sizes. – Fixee Oct 21 '11 at 01:42
  • 1
    @Fixee: call it "proof-of-concept" ;-) ... generate predictable, but large amounts of data with threads or processes, e.g. in the first bytes write the timestamp, the PID, and a chunk-number in case you use multiples of the system's block size - the rest can be filled with a fixed pattern - so you can later analyze the order of the blocks.. – Tilo Oct 21 '11 at 03:58
  • @Tilo: I don't understand what you mean by "getting interrupted by" process B? – Edward Thomson Oct 21 '11 at 04:44
  • 2
    On Linux with a file open as O_RDWR, if you are using lseek/write instead of fseek/fwrite, there is no user-space buffering of data. So long as your writes are truly non-overlapping, you will get what you expect. For example, one process can write all of the odd bytes, and another can write all of the even bytes. This only works if the file is not being extended as you write. All bytes must already "exist" but a sparse-allocated file is fine. – Wheezil Jun 07 '16 at 19:46
  • 2
    @Wheezil That's exactly what I did (5 years ago now) on that project and it's been in production ever since. I needed to SHA1 each block as it arrived over the network, and therefore needed to multithread the program (else it would be CPU bound). Threads would write the block once-and-only-once via lseek/write to the output file. – Fixee Dec 19 '16 at 17:38
  • zombie :) @Fixee would it have been more straightforward to thread the sha1 but pass the buffers (pointers to buffers) to a single 'write' thread for the actual file writing? – Evan Benn May 30 '18 at 07:20
  • 1
    @Fixee if you're using lseek/write, how do you know that you haven't overwritten another block? Do you do additional book-keeping of what was written? What happens if two processes are doing the same lseek, and both write? – Tilo May 30 '18 at 19:04
  • @zombie The elegant solution would be a Thread Pool of sha1 workers that consume buffers and then write them out. I had very limited time so instead just spawned multiple processes giving them disjoint block ranges to download. This let me use multiple cores with very little work, though it's clunky (processes won't necessarily end at the same time, for example... though usually it's close). Poor man's parallelism. – Fixee May 31 '18 at 21:17
  • @Fixee If you used `lseek()`/`write()`, I do hope you used separate file descriptors for each thread, and that each file descriptor was obtained by calling `open()` on the actual file and not by using `dup()` on another file descriptor. You can also safely use one file descriptor across multiple threads if you use `pwrite()` instead of `lseek()`/`write()`. – Andrew Henle Jun 01 '18 at 07:30
  • I belive you are incorrect regarding buffers. This can apply to fwrite and co, so basically library functions. If syscal, like write is ised, it would go directly to kernel. What happens next is another question. I assume it might not be safe if write not at block granularity. – Bogdan Mart Jul 06 '22 at 17:35
  • good point @BogdanMart! thank you for adding to this 11 year old answer – Tilo Jul 07 '22 at 18:25
40

What you're doing seems perfectly OK, provided you're using the POSIX "raw" IO syscalls such as read(), write(), lseek() and so forth.

If you use C stdio (fread(), fwrite() and friends) or some other language runtime library which has its own userspace buffering, then the answer by "Tilo" is relevant, in that due to the buffering, which is to some extent outside your control, the different processes might overwrite each other's data.

Wrt OS locking, while POSIX states that writes or reads less than of size PIPE_BUF are atomic for some special files (pipes and FIFO's), there is no such guarantee for regular files. In practice, I think it's likely that IO's within a page are atomic, but there is no such guarantee. The OS only does locking internally to the extent that is necessary to protect its own internal data structures. One can use file locks, or some other interprocess communication mechanism, to serialize access to files. But, all this is relevant only of you have several processes doing IO to the same region of a file. In your case, as your processes are doing IO to disjoint sections of the file, none of this matters, and you should be fine.

janneb
  • 36,249
  • 2
  • 81
  • 97
  • 3
    I am using lseek() to a specific multiple of 16*1024 and then using write() to write 16KiB of data. The processes **never** overlap their write-regions. – Fixee Oct 25 '11 at 22:26
  • @Fixee: Yeah, you should be fine then. – janneb Oct 26 '11 at 08:08