is it safe to write to a file in another thread?

Question

I do not know, if this is ok, but it compiles:

typedef struct
{
   int fd;
   char *str;
   int c;
} ARG;

void *ww(void *arg){
   ARG *a = (ARG *)arg;
   write(a->fd,a->str,a->c);

   return NULL;
}


int main (void) {

   int fd = open("./smf", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
   int ch = fork();

   if (ch==0){
      ARG *arg; pthread_t p1;
      arg->fd = fd;
      arg->str = malloc(6);
      strcpy(arg->str, "child");
      arg->c = 6;

      pthread_create( &p1, NULL, ww, arg);
   } else {
      write(fd, "parent\0", 7);
      wait(NULL);
   }

   return 0;
}

I am wait()int in parent, but I do not know if I should also pthread_join to merge threads or it is implicitly by wait(). However is it even safe to write to the same file in two threads? I run few times and sometimes output was 1) parentchild but sometimes only 2) parent, no other cases - I do not know why child did not write as well when parent wait()s for it. Can someone please explain why these outputs?

the posted code fails to handle the case where the call to `fork()` fails — user3629249, Jan 13 '20 at 09:06
this line; `pthread_create( &p1, NULL, ww, arg);` should be followed by `pthread_join( p1 );` and `exit( EXIT_SUCCESS );` — user3629249, Jan 13 '20 at 09:07
regarding this parameter: `S_IRWXU)` The `X` makes the output file executable, which seems a bit over doing it. — user3629249, Jan 13 '20 at 09:15
regarding: `"parent\0"` The `\0` is a NUL byte and the compiler will also append a NUL byte. You don't need a pair of NUL bytes at the end. — user3629249, Jan 13 '20 at 09:19
the I/O stream `stdout` is buffered. the data is only passed to the terminal under certain conditions. Strongly suggest after each call to `write()` be immediately followed by a: `fflush()` statement. — user3629249, Jan 13 '20 at 09:21

score 3 · Answer 1 · answered Jan 12 '20 at 13:05

3

You need to call pthread_join() in the child process to avoid potential race conditions during the child process’s exit sequence (for example the child process can otherwise exit before its thread gets a chance to write to the file). Calling pthread_join() in the parent process won’t help,

As for the file, having both processes write to it is safe in the sense that it won’t cause a crash, but the order in which the data is written to the file will be indeterminate since the two processes are executing concurrently.

answered Jan 12 '20 at 13:05

Jeremy Friesner

70,199
15
131
234

But how it differs, when I call fork() for new process, is it also new thread? It should be withing the same memory space as parent, or? So when I call `pthread_create`, is it ANOTHER thread from already fork()ed? Because I do not get why I cannot `phtread_join` in parent but must in child after the creation – Herdsman Jan 12 '20 at 13:14
1

fork() creates a child process which is a copy of the parent process at the instance fork() was called. So like the parent process, your child process will only have the ‘default thread’ that started execution at main(). Your child process then goes on to call pthread_create(), which spawns a thread in the child process (but not in the parent process) hence the need to call pthread_join() in the child process (but not in the parent process) – Jeremy Friesner Jan 12 '20 at 14:05

score 1 · Answer 2 · answered Jan 12 '20 at 13:27

1

I do not know, if this is ok, but it compiles:

Without even any warnings? Really? I suppose the code you are compiling must include all the needed headers (else you should have loads of warnings), but if your compiler cannot be persuaded to spot

buggy.c:30:15: warning: ‘arg’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
       arg->fd = fd;
             ^

then it's not worth its salt. Indeed, variable arg is used uninitialized, and your program therefore exhibits undefined behavior.

But even if you fix that, after which the program can be made to compile without warnings, it still is not ok.

I am wait()int in parent, but I do not know if I should also pthread_join to merge threads or it is implicitly by wait().

The parent process is calling wait(). This waits for a child process to terminate, if there are any. Period. It has no implications for the behavior of the child prior to its termination.

Moreover, in a pthreads program, the main thread is special: when it terminates, the whole program terminates, including all other threads. Your child process therefore suffers from a race condition: the main thread terminates immediately after creating a second thread, without ensuring that the other thread terminates first, so it is undefined what, if any, of the behavior of the second thread is actually performed. To avoid this issue, yes, in the child process, the main thread should join the other one before itself terminating.

However is it even safe to write to the same file in two threads?

It depends -- both on the circumstances and on what you mean by "safe". POSIX requires the write() function to be thread-safe, but that does not mean that multiple threads or processes writing to the same file cannot still interfere with each other by overwriting each other's output.

Yours is a somewhat special case, however, in that parent and child are writing via the same open file description in the kernel, the child having inherited an association with that from its parent. According to POSIX, then, you should see both processes' output (if any; see above) in the file. POSIX provides no way to predict the order in which those outputs will appear, however.

I run few times and sometimes output was 1) parentchild but sometimes only 2) parent, no other cases - I do not know why child did not write as well when parent wait()s for it. Can someone please explain why these outputs?

The child process can terminate before its second thread performs its write. In this case you will see only the parent's output, not the child's.

answered Jan 12 '20 at 13:27

John Bollinger

160,171
8
81
157

I know `arg->fd` is not initialized, but do not know why. `arg->fd = fd`. – Herdsman Jan 12 '20 at 14:05
And how does race condition works? If i dont join in child, but parent still wait()s for me, then why would not the second thread terminate? – Herdsman Jan 12 '20 at 14:09
and last one :`yours is special - the same descriptor`, so you are saying, that no matter of how many threads write to a particular file descriptor, then they will NOT overwrite each other? (that is how I understood you claim), but with other consruct such as `FILE *` (in fopen, for example), there is possibility of overwriting outputs? – Herdsman Jan 12 '20 at 14:16
you have not answered my questions yet – Herdsman Jan 12 '20 at 18:37
1

@Herdsman, regarding `arg`, you do not provide an initializer in the declaration, so it is not initialized. You also do not subsequently assign a value to it before you try to dereference its (pointer) value to write to the object it hypothetically points to. Probably you want to declare `arg` as an `ARG`, not a pointer to one, and ajust your usage of it accordingly. – John Bollinger Jan 12 '20 at 18:44
@Herdsman, the race condition is not about whether the second thread terminates, but rather about what work it performs before it terminates. If the main thread terminates first then the second is forcibly terminated at that time, too, regardless of how far it has progressed in its work. In particular, that may happen before it executes its `write()` call. – John Bollinger Jan 12 '20 at 18:48
1

@Herdsman, I said the same *open file description*, not the same file descriptor. These are different things, albeit related. The former is a kernel data structure, whereas the latter is a number, not necessarily unique, that refers to an open file description in the scope of some specific process. And a `FILE` object or a pointer to one is yet again something different. My comments specifically refer to using the `write()` function, and that alone. – John Bollinger Jan 12 '20 at 18:55
With that said, threads (only) writing to streams associated with the same open file description via the stdio output functions also will not overwrite each others' output, but the actual output you may observe in this case is complicated by the buffering that streams normally perform. – John Bollinger Jan 12 '20 at 18:59
How is the buffering perform? How it is complicated? If they have the same file description (in kernel internals as you are saying), then how does it effect stdio operation? Why it cannot override each other despite having the same description, what does it contains so special? Please either extend your answer, or add more comments. It is not complete. – Herdsman Jan 12 '20 at 22:00
the kernel puts all writes to a specific file, with the same file descriptor, in a queue in FIFO order. – user3629249 Jan 13 '20 at 09:30
You have many questions, @Herdsman, and we have reached the point where their connection to the one to which my answer is actually directed is tenuous. I will not explain the whole C and POSIX standards to you in the context of this one answer. You are free to post additional SO questions. – John Bollinger Jan 13 '20 at 11:39
@JohnBollinger at least tell me, in what sense it is special (buffering). How is buffered made, when both processes use the same fd? (I am talking about "it is complicated in respect to buffering performed". That's all. my last question – Herdsman Jan 13 '20 at 11:55
@Herdsman, streams are buffered by default. Data printed to a buffered stream initially go to the buffer, not the ultimate I/O device, and that can affect the order in which data actually reach the device when multiple processes are involved. Moreover, such I/O buffers reside in a process's own memory, so are duplicated across a `fork()`. That can cause unwanted duplicate output if not handled properly (by, say, flushing all as-yet unwritten data before forking). – John Bollinger Jan 13 '20 at 13:27

is it safe to write to a file in another thread?

2 Answers2