What is the difference between threads and forked processes in Unix?

Question

I know fork process does not share memory, and threads do, but then how can forked processes communicate one another?

Here is example, where one version with thread is commented out (and that version will end), and the other version with fork will never ends. The code is relying on the global variable done:

#include <stdio.h>
#include <stdbool.h>
#include <signal.h>
#include <unistd.h>
#include <pthread.h>

bool done = false;

void *foo(void *arg){
    sleep(1);
    done = true;
    return 0;
}

int main(){
    //pthread_t t1;
    //pthread_create(&t1, NULL, foo, NULL);
//
    //printf("waiting...\n");
    //while(!done){}
    //printf("Ok. Moving on.\n");
    
    printf("waiting...\n");
    if(!fork()){
        foo(NULL);
    } else {
        while(!done){}
        printf("OK. moving on.\n");
    }
}

So if forked processes do not share data (i.e. global variables?) unlike threads, how do they otherwise communicate in unix?

EDIT: this is definitely not a duplicate as I already seen similar topics like Forking vs Threading and other documents about fork/threads in *nix. I just want to know use cases of both. (e.g windows has no fork, only threads, so they probably had different use cases in mind?)

Forked processes commonly communicate through piped IO streams (the output of one is the input of another and etc), which is actually how command shells such as bash work internally. There are several other forms of Inter Process Communication (IPC) as well, such as shared memory, signals, sockets (similar to IO), and more. — h0r53, Dec 15 '20 at 14:44
Tried to search "difference between processes and threads"? There are tons of information about it. — Eugene Sh., Dec 15 '20 at 14:44
This is a somewhat advanced topic, but based on the sample code you've provided you should look into IPC in C programs. I'm sure you'll find several examples of how to create "equivalent" code to your threading example by using fork. — h0r53, Dec 15 '20 at 14:47
fork() is basically a really bad API from back in the dark ages before multi-threading was even invented. It lives on because *nix simply loves horrible APIs and must preserve them for all eternity. There's probably no sound reason to ever use fork() nowadays. If you want to start another process, then start _another_ process, not a copy of the current one with all the useless bloat carried over - that's a really dumb thing to do for multiple reasons, RAM use, readability and program safety being some of the main reasons not to. Most of the time, the correct solution is to use threads. — Lundin, Dec 15 '20 at 14:52
@Lundin There is nothing bad about `fork()`, you just have to use it correctly. The problem is that many libraries do not support it and you can't use it with this libraries without creating memory leaks. `fork()` creates a new process. Your RAM use argument makes no sense because of COW. Readability and safety are advantages of `fork()`. `fork()` is a lot safer than using threads, do not use threads over processes, because threads are very hard to program safe and even harder to test if there are no race conditions. — 12431234123412341234123, Dec 15 '20 at 15:15
@12431234123412341234123 "No reason not to use ... in _some_ circumstances" is also a dodgy statement. I was simply addressing the OP's final question "if forked processes do not share data ... how do they otherwise communicate in unix?" The answer of which is IPC. I simply started with the example of IO pipes because the most common example of fork I've worked with are in command shells. There are many forms of IPC available though, which is why I provided other options. — h0r53, Dec 15 '20 at 15:22
@h0r53 pipes are often used for communication between processes with different executables. I doubt this is also the case for communications between child and parent with the same executable. — 12431234123412341234123, Dec 15 '20 at 15:28
`fork` -> `dup2` -> `exec` is a very common pattern for command shells, but this is starting to become off topic, as the relevant response to the OP's question was IPC, of which there are multiple options — h0r53, Dec 15 '20 at 15:37

12431234123412341234123 · Accepted Answer · 2020-12-15T15:57:08.383

fork() copies the current process. Without any special preparations, almost no data is exchanged between child and parent. It is just so that the new process is identical to the old one, but as soon as you write a variable, a copy of the written region is created and the child gets a new physical memory location for this data. This means settings a variable in the child will not be visible for the parent and vice versa.

You can use shared memory, pipes, files, sockets, signals, and probably other IPC methods to communicate between child and parent. For your special case you can use the wait() or waitpid() function to wait till your child exits. But I assume you want to know how to exchange data.

Shared memory

You can use the mmap() call to reserve memory that is shared between parent and child.

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

You can pass the flag MAP_SHARED | MAP_ANONYMOUS to flags to create a memory region that is shared. There you can place the shared variable and both can access it. Here is an example.

//creates a region of shared memory to store a bool
static bool *reserveSharedMemory(void)
  {
    void *data = mmap(NULL, sizeof(bool), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if(MAP_FAILED==data) 
      {
        //do some error handling here
        return NULL;
      }
    bool *p=data;
    *p=false;
    return p;  
  }

Sockets

Sockets allow you send and receive data with something else. With socketpair() you can create 2 socket file descriptors and you can communicate by writing to one of them and reading at the other file descriptor or verse visa. This way communication with the child process becomes almost the same as communicating with a network socket.

score 0 · Answer 2 · answered Dec 15 '20 at 15:23

When you execute a fork you create a copy of the process you are executing with a different PID the variables declared before the fork() execution will appear in both processes. fork returns 0 in the "child" process and returns the pid of the "child" process in the "parent" process (with a switch, you can control the behavior of both processes).

If you want to communicate different processes created by fork() you can declare BEFORE an array of file descriptors such as int fd[2] and execute pipe(fd). If the result of pipe isn't -1, means you have created two "cables" where you can write or read information.

Here you can see an example on how this can work

score 0 · Answer 3 · answered Dec 15 '20 at 15:27

0

As you probably already know, a forked thread is a child of the main thread that called the fork(), it gets initialized with a copy of the address space and the file descriptor table of the father while it shares the open files table. As someone already said it doesn't really makes sense to use forked thread when you can just create a new one, and that's because it's never a good idea to have two copies of the same thread. A note I'd like to make is that you can create a forked thread which shares all the data with the father using "vfork", but this one is REALLY DEPRECATED, I added this just as additional information. You can use pipes, sockets etc. to communicate between father-child if you want to and you can determine if you're on father or child thread by checking the pid.

answered Dec 15 '20 at 15:27

Matteo Pinna

409
4
9

Re, "...a forked thread..." The thing that `fork()` creates is called a "process," not a "thread." – Solomon Slow Dec 15 '20 at 19:19
Re, "...using vfork...is really deprecated." That's not what "deprecated" means. We say that an API feature is _deprecated_ if somebody once thought it was a good idea, but then they later changed their mind, and they don't intend to support it in the future. The `vfork(...)` system call was _never_ meant to be used in the way that you describe. Vfork is a highly specialized optimization of fork(). There only ever was one correct way to use it, which is for the child process's _very next_ system call to be an `exec(...)` call. – Solomon Slow Dec 15 '20 at 19:32

What is the difference between threads and forked processes in Unix?

3 Answers3

Shared memory

Sockets