1

I'm afraid I have to take out a bit of length so that my question can be understood exactly. I know that forking can cause some problems, especially if you mix it with threads. One nasty problem I encountered was that a thread was cloned by a fork in the middle of a "localtime"-call, which made it impossible to continue working in the forked process because "localtime" (not stateless) remained in a low-level lock forever. Furthermore, there were problems with the C library libmysql, for example, when an open connection was forked and then closed in the forked process via mysql_close(). Since the forked process is a long-lived process and I cannot do without threads, I have decided, in order to avoid as many such problems as possible, to outsource the business-logic-routine of the forked process as a real executable binary and execute it in the forked process via "execv". Before execv(), all file descriptors (FD's) are closed.

What do I still have to consider here now in this new scenario B regarding low-level-locks, threads, mysql, ... ? Is it always safe to NOT pay attention to anything here now in the forked process?

OLD Scenario A:
(Compare: Forking a process with threads containing sockets in C++)

Process P1 with

  • Open receiving Socket S1 at Port 6610
  • Open SQL-Connection SQL1
  • Thread T1 with
    • Open SQL-Connection SQL2
    • Socket-Connection S2

Prozess P1 should be forked to P2 and P2
needs the sql-connection too. T1 is not used by P2.
Socket S1 is also not used by P2.

Before fork():

  • mysql_close() at SQL1 of P1
  • Waiting to bring T1 in a state, where it is 100% idle and not
    calling any calls like "localtime". SQL2 remains open.

Then after fork() in P1 (parent):

  • Reopen mysql connection SQL1
  • Continue processing in T1

Then after fork() in P2 (child):

  • Reopen mysql connection SQL1
  • Close socket S1
  • T1 is "dead" in a controlled 100% idle state, without blocking anything.
  • Doing the business logic

NEW Scenario B:

Process P1 with

  • Open receiving Socket S1 at Port 6610
  • Open SQL-Connection SQL1
  • Thread T1 with
    • Open SQL-Connection SQL2
    • Socket-Connection S2

Prozess P1 should be forked to P2 and P2
needs the sql-connection too. T1 is not used by P2.
Socket S1 is also not used by P2.

Before fork():

  • Consider nothing

Then after fork() in P2 (child):

  • Closing ALL FD's (except STDOUT/STDERR) with close()
  • execv -> replaces all memory with new process image I1
    • I1 is doing a new sql-connection
    • I1 Doing the business logic

Question 1:
Scenario B here seems to be the better (easier) way for me, because I have not not deal with states of other threads (right?) because the process image is replacing the whole memory and overwriting all memory-states like mutex/locks. But what happens here with os-ressources, which survive the execv() in the new process like descriptors of sockets etc? More concrete, if I close all FD's with close() in the child P2, will the (open) mysql-connections SQL1 & SQL2 (also using FDs for its internal sockets, right?) be affected here? Will only be the ref-Count decremented or are the sql-connections in P1 endangered? Same for S1/S2? All safe here?

Question 2:
What happens ins Scenario A with SQL2 and S2 in P2 because they remain completely untouched for a long time (days). Are they possibly blocking the freeing of resources in P1 because the descriptor refcount is not counting down? Apart from that, are there any other problems here? If P2 is terminating without calling mysql_close(), will the sql-connection survice in p1?

SoulfreezerXP
  • 459
  • 1
  • 5
  • 19
  • 2
    I'm no expert, but I believe there should be no issues with creating a new process by `fork`+`exec`. It's the most popular way IIRC, and if new processes freeze depending on the parent's state, that would be unexpected to say the least. – yeputons Dec 20 '22 at 19:36
  • 1
    By "outgoing socket", do you mean a *connected* socket? Or simply a *non-listening* socket? Or something else? Once a socket connection is established, the sockets on each end are peers, and bidirectional communication is supported. The term "outgoing" is not conventionally used to describe any kind of socket. – John Bollinger Dec 20 '22 at 20:02
  • I meant a socket connection, established via connect() to some server. – SoulfreezerXP Dec 20 '22 at 20:30
  • 1
    The original scenario was misusing `fork()`. In a multi-threaded process, `fork()` can only safely be used to create a new process and then immediately `exec*()` something else. Per [POSIX 7](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/fork.html): "... the effects of calling functions that require certain resources between the call to `fork()` and the call to an exec function are undefined." – Andrew Henle Dec 20 '22 at 20:47
  • But if the thread is in a well-known state before being forked (sleeping in main-loop) this should be working. I tried it out this way and all related errors gone away. – SoulfreezerXP Dec 20 '22 at 21:10
  • @SoulfreezerXP You're assuming threads in that state aren't holding any locks, and also that it's always possible for the state to be well-known. And then, how do you guarantee that when you call `fork()` all those threads are **still** in that "well-known state"? Whatever checks you do are invalid the instant you have the results, and relying on those results is a [TOCTOU bug](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use). – Andrew Henle Dec 20 '22 at 21:50
  • I use a mutex protected condition-variable in the main-thread M1 and waiting for the other thread T1 telling me to be in a safe state (sleeping) before I do the fork. The sleep in T1 is within a while-loop to also handle spurious wakeups and signals. If a signal happens, only one atomic-flag is switched. After the fork I set the condition-variable back (in M1) and T1 will continue its work. T1 is designed to process all its work in very small chunks to be able to be locked very quickly within milliseconds. – SoulfreezerXP Dec 21 '22 at 05:34

1 Answers1

2

Question 1:
Scenario B here seems to be the better (easier) way for me, because I have not not deal with states of other threads (right?) because the process image is replacing the whole memory and overwriting all memory-states like mutex/locks. But what happens here with os-ressources, which survive the execv() in the new process like descriptors of sockets etc?

Nothing in particular happens with them. That's more or less what it means for them to survive the execv().

More concrete, if I close all FD's with close() in the child P2, will the (open) mysql-connections SQL1 & SQL2 (also using FDs for its internal sockets, right?) be affected here?

Process P2 closing (its copies of) those file descriptors will disassociate P2 from the underlying open file descriptions, but that should not affect process P1, which has its own, independent association with the open file description via its own copies of the same FDs. There might be more to say if there was some kind of custom driver involved, but that's surely not the case here.

The story might be different, however, if P2 performed application-level closures of the DB connections, as opposed to simply close()ing the underlying file descriptors.

Will only be the ref-Count decremented or are the sql-connections in P1 endangered? Same for socket S1/S2? All safe here?

Same for the sockets, listening or otherwise.

Question 2:
What happens ins Scenario A with SQL2 and S2 in P2 because they remain completely untouched for a long time (days).

Nothing in particular happens with them. Whatever application-level state the mysql client maintains does not change, so that probably falls out of sync pretty quickly, but it continues to take up space in P2. The underlying open file descriptions for the socket and database connections remain open at least as long as P2 does not close them, counting against the user's and system's limits on open files.

Are they possibly blocking the freeing of resources in P1 because the descriptor refcount is not counting down?

They block the release of kernel resources dedicated to the connections, and at least the mysql client data probably continues to take up space in P2. But that has no direct effect on P1.

Apart from that, are there any other problems here?

It's hard to say what will happen when all but one of the threads of a multithreaded program suddenly vanish without a trace, in the middle of whatever they were doing at the time.

It's also hard to say what process state the child my try to rely upon that would be unsafe under the circumstances. Does the process set up and rely upon timers? They're not inherited across a fork. Does it rely on mlock() memory locks? They're not inherited either. Process-associated fcntl() record locks? Also not inherited. On the other hand, some other kinds of file locks are inherited, so you might suddenly have two different processes attempting conflicting accesses to the same files. There's more.

If P2 is terminating without calling mysql_close(), will the sql-connection survice in p1?

Probably. It's rather more likely that the connection will survive in P1 in that case than if P2 did call mysql_close().

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 1
    *Nothing in particular happens with them.* Maybe. The close-on-exec flag can be set on any particular descriptor with `fcntl(fd, F_SETFD, FD_CLOEXEC)` or `open( "/path/to/file", ... |O_CLOEXEC, ...)`, and that descriptor will be closed on calling any of the `exec*()` functions. But IME it's easy enough to just get the max descriptor value from a call to `getrlimit()` and then just do `for( int f = 3; f < maxFDs; f++ ) close(f);` Just `close()` 'em all blindly in the `fork()`'d process is easier than checking. – Andrew Henle Dec 20 '22 at 20:57
  • 1
    Well yes, @AndrewHenle, but that particular question was "what happens here with os-ressources, which survive the execv() [...]?" It's a fair point that which resources do survive is not necessarily as clear as one might think, but I would not characterize close-on-exec file descriptors as being among those. In any case, I agree that just closing all possible open file descriptors is a reasonable strategy, as long as the limit is not so high as to make that a performance issue. – John Bollinger Dec 20 '22 at 21:02
  • 1
    But your "Nothing in particular happens with them." isn't quite correct - any descriptor with the close-on-exec flag set won't survive the `exec()` call as it will be closed. Hence my comment. And IMO the most efficient way to deal with descriptors is just to `close()` 'em all - that's faster than calling `fcntl()` on every possible file descriptor and then closing valid file descriptors without the close-on-exec flag set. – Andrew Henle Dec 20 '22 at 21:45
  • 1
    Yes, @AndrewHenle, file descriptors with close-on-exec set will not survive the `exec()`. Already stipulated. Therefore, **they are not among the *them*** (resources that survive the `execv()`) to which my statement refers. – John Bollinger Dec 20 '22 at 22:09
  • All the comments here have been very helpful to me, so I have upvoted them all. – SoulfreezerXP Dec 21 '22 at 12:14