1

Although the kernel marks pages (and page tables) as copy on write to make the fork syscall work efficiently, the creation and tear-down of page tables and related structures is still an expensive task.

Thus I wonder why the linux community has never managed to implement posix_spawn as a real kernel syscall that just spawns a new process, eliminating the need to call fork beforehand. Instead, posix_spawn is just a poor glibc wrapper around fork and exec.

The performance gains would be significantly for workloads that have to spawn thousands of new processes every second. The latency for launching new processes would be improved as well.

Mike76
  • 899
  • 1
  • 9
  • 31
  • This is not a good SO question. Maybe try some programming reddits or mailing lists targetted at linux development? Also, for the record, due to all the things that must happen on windows, CreateProcess is slower than fork+exec on linux. https://stackoverflow.com/questions/47845/why-is-creating-a-new-process-more-expensive-on-windows-than-linux – viraptor Dec 15 '16 at 22:55
  • I don't think having posix_spawn implemented in the kernel would speed it up much. It's what the kernel needs to do that takes most of the time. Not the system call overhead. – Petr Skocik Dec 15 '16 at 23:09
  • If you want to load code up without forking a new process and replacing the process image, then dlopen a library or a position independent executable. That's fast, and gets you most of what spawning a new process can do, sans some security isolation, and possible setuid-based privilege escalation. – Petr Skocik Dec 15 '16 at 23:14
  • Have you looked at `vfork`? – that other guy Dec 15 '16 at 23:57
  • Yes I know vfork, and I think that it is a crappy solution to tackle this performance issue. It is like, hey let's provide a broken version of fork, instead of giving the option to avoid fork in the first place – Mike76 Dec 16 '16 at 00:00

2 Answers2

2

That's basically what posix_spawn is for. It is also a more flexible API. The real bug is that the Linux exec man page still doesn't include a cross-reference for it.

dgatwood
  • 10,129
  • 1
  • 28
  • 49
  • actually I am not sure whether posix_spawn really does that. http://stackoverflow.com/questions/2731531/faster-forking-of-large-processes-on-linux explains that it does some kind of vfork internally. Concerning the man page: I did not even find a man page for posix_spawn on my local machine, although there are comprehensive man pages for fork and exec preinstalled – Mike76 Dec 15 '16 at 22:58
  • 2
    It sounds like the Linux implementation of `posix_spawn` is a libc-level implementation, so you don't get the reduction in the number of system calls that you do on BSD systems, but that performance difference is pretty negligible when `fork` is implemented with copy-on-write anyway. – dgatwood Dec 15 '16 at 23:07
  • I disagree with this point, having copy on write still involves iterating over a lot of mapping entries and invalidating caches. In addition to that: the whole overhead of duplicating the process is useless – Mike76 Dec 15 '16 at 23:44
  • Posix: Also, although they may be an efficient replacement for many fork()/ exec pairs, their goal is to provide useful process creation primitives for systems that have difficulty with fork(), not to provide drop-in replacements for fork()/ exec. This view of the role of posix_spawn() and posix_spawnp() influenced the design of their API. It does not attempt to provide the full functionality of fork()/ exec in which arbitrary user-specified operations of any sort are permitted between the creation of the child process and the execution of the new process image – Mike76 Dec 15 '16 at 23:58
  • The original question was whether there was a function that started a child process without the need to make multiple calls, which pretty much precludes doing anything between the two calls. – dgatwood Dec 16 '16 at 01:16
  • As for invalidating caches, you would almost certainly have to do that when you start a new process anyway, because otherwise one process would be reading data from the prior process. You're right that it requires iterating the vm mapping structures, which is why it would be better if Linux implemented posix_spawn in the kernel like other OSes. With that said, the penalty is not large, and if you're spawning so many processes that it matters, you probably have much bigger architectural problems than the lack of an in-kernel posix_spawn. :-) – dgatwood Dec 16 '16 at 01:19
  • 1
    without fork, we would not have to trash the cached tlb entries of the *current* process which wants to spawn a new process. You can not expect to clear the writeable bit of a page table entry and still keep the cached translation entries alive. In addition to that, iterating over the vm mapping structures of a large process leads to tons of cache misses during the execution of fork. Finally, after fork has finished it leads to a number of minor faults in the parent process (e.g. stack pages, data pages, heap pages) This slows down the continuing execution of the parent process – Mike76 Dec 16 '16 at 15:17
1

Fork with copy-on-write is very expensive. To illustrate this, you might want to read the implementation of classic vfork semantics in NetBSD. The mail provides some hard numbers for a real world use case, building software. COW for very large programs is also an easily measurable penalty. A friend of mine wrote his own spawn daemon for his Java application, because forking+exec from a 8GB+ JVM took way too long.

The main problem with vfork in the modern world is that it can interact badly with multi-threading. I.e. consider that the post-vfork code has to reference a function that hasn't been resolved by the dynamic linker yet. The dynamic linker now has to lock itself. This can result in dead locks with the original program for example.

joerg
  • 114
  • 4
  • This is exactly what I wanted to point out, everybody thinks that copy on write is so efficient that the cost of fork does not matter at all, but it still involves changing a lot of page table entries and then handling cow minor page faults, which is anything but free – Mike76 Dec 16 '16 at 15:14
  • As you can see this site is really not fit for somewhat lower level stuff. Stuff which gets upvoted is typically uninformed and often repeats popular myths/misconceptions. I can only recommend you take questions of the sort to e.g. mailing lists. –  Dec 16 '16 at 17:46