1

This is similar but a bit different to existing questions. Say I have many threads that open the same file but they all do their own fopen and maintain their own FILE pointer. a) is it necessary to lock fwrite calls if they have their own FILE ptrs? b) if it is necessary, is locking around fwrite enough or will they potentially flush at different times and end up intermingling when they flush? If yes, would locking on fwrite and then fflush cover it?

ByteMe95
  • 836
  • 1
  • 6
  • 18
  • 2
    Maintaining file handles to the same file in multiple threads is a mess. I would strongly suggest restructuring your system to restrict file writing to a single thread, with other threads providing information that needs to be written through a synchronized queue. – Sergey Kalinichenko Jan 04 '18 at 19:01
  • 1
    C and C++ are different languages. C++ typically doesn't use both these functions. Do you mean only C? – Passer By Jan 04 '18 at 19:01
  • If you really need to solve this for performance reasons, you're probably better off using low-level calls for your particular OS, such as `write()` (or better, `pwrite()` so you only need one file descriptor) on a POSIX OS. More abstract functions such as `fwrite()` or C++ streams do not in general give you direct control of the actual system call(s) made. Note well, though, that multithreaded IO to a single file isn't really likely to be faster on typical hardware, and if the file is physically stored on a single spinning disk, multithreaded access can even be slower. – Andrew Henle Jan 04 '18 at 20:22

2 Answers2

2

This question can not be answered in the context of programming languages. As far as programming language is concerned, those file handles are completely independent objects, and whatever you do with one has no effect whatsoever on another.

The question is on the operating system - can it handle multiple write operation to the same underlying file at the same time. In other words, are those writes atomic. I can't say for all of them, but in Linux, for example, writes for less than PIPE_BUF size are atomic.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • Seems apropos: [Atomicity of `write(2)` to a local filesystem](https://stackoverflow.com/questions/10650861/atomicity-of-write2-to-a-local-filesystem) Note that when using higher-level functions such as `fwrite()`, one function call can result in multiple calls to `write()` - or even none at all if the data remains in the buffer for the `FILE *`. C++ streams are similar. – Andrew Henle Jan 04 '18 at 20:25
  • Hardly makes a practical difference. If no calls to `write` are made, no interleaving is happening to begin with, and it is unlikely to make multiple write calls when the size is less than PIPE_BUF. – SergeyA Jan 04 '18 at 21:33
  • I think it would make a huge difference. If no calls to `write()` are made, the data remains in the process's buffer and will wind up being written to the file at some later time barring an explicit `fflush()` call. And while `fwrite()` isn't likely to result in an unpredictable number of `write()` calls even if unbuffered, something like `fprintf()` can. – Andrew Henle Jan 05 '18 at 11:01
0

For the quick measure, yeah, you can put a lock around the I/O part. That'd work, I guarantee it. As for flusing I/O cache, I'd recommend not doing that. It's always best to let OS to handle I/O timing because kernel knows what's going on the best. You are not gonna have it in effect immediately after calling flush anyway because it's that complicated. Just like the other flush operations(java GC, glFlush and so on). If you choose to stick to this option, please be mindful of a start and an end point of the concurrent I/O op. You wouldn't want a case where the main thread closes the file and another worker thread tries to do I/O on that.

The general solution to this problem is creating a thread that handles the file exclusively. If other thread should read/write from/to the file, they must ask the thread to do that for them. This is tricky, I know. You'd need to compose a simple protocol, sync mechanism, but in a nutshell, it goes like this:

  1. prep a queue, a cv(condition variable), a lock. create a thread and open the file. Doesn't matter who opens the file
  2. The thread spawns and waits for the queue to be filled in
  3. Other threads send a request I/O op to the thread. The request includes the data for the file and an op code.
  4. The thread handles the requests from the queue. This is where the real I/O happens.

You could use anonymous FIFO instead of a queue. Or skip the opcode part if the file is write-only.

Unlike network I/O, modern OSes can't do file I/Os in a non-blocking manner. So expect a significant blocking time(io wait). Also, there's this problem where the queue fills up too quick and eats a lot of memory when I/O is relatively slow. There will be a case where the whole program should wait for the I/O to complete before terminating itself. Not much you can do about it. You could close the file from another thread while I/O is in progress on Linux(close() is MT-safe ), I don't know how that's gonna work on other OS.

There are alternatives like async file I/O or overlapped I/O which involves signal handling or callbacks. Using these doesn't require a creating of a thread but each has pros and cons, mostly regarding portability.

  • *As for flusing I/O cache, I'd recommend not doing that.* Multithreaded access to a file using functions such as `fwrite()` via different `FILE *` values *without* flushing with `fflush()` are pretty much guaranteed to produce corrupt data. *modern OSes can't do file I/Os in a non-blocking manner* [Yes they can.](http://pubs.opengroup.org/onlinepubs/9699919799/functions/lio_listio.html). – Andrew Henle Jan 04 '18 at 20:14
  • @AndrewHenle What do you think A in aio stand for? I already mentioned that there are alternatives including that one. `fflush()` has nothing to do with corruption or order of IO. Only the order of `fwrite()` calls matter. –  Jan 04 '18 at 20:37
  • Well, I guess you could say aio is non-blocking, if op chooses to use posix. –  Jan 04 '18 at 20:47
  • [Not just POSIX.](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365683(v=vs.85).aspx) – Andrew Henle Jan 05 '18 at 11:02
  • 1
    Pointing out inaccuracies is not trolling. It's making sure someone who stumbles across this page in the future knows your statements aren't correct. – Andrew Henle Jan 06 '18 at 13:50