Is forking processes (instead of forking threads) preferred in scripted web applications?

Question

In Java web applications it is typical to spawn threads to process the web requests. I am referring to the application code and not the container's threads to accept incoming client connections.
In scripting languages, e.g Perl or Python my understanding is that it most frequent to use the multiprocessing paradigm (fork processes) than the multithreaded one (fork threads).
I personally find forking processes instead of threads in a web server application code "weird" and heavier.
Am I correct on this? Is forking processes usual during web processing in these frameworks or not?

"In Java web applications it is typical to spawn threads to process the web requests." Are you serious? Who the hell does that? — Mike Baranczak, Jan 12 '14 at 21:57
@MikeBaranczak:I am not talking about the http threads. I am talking about back end operations. It is typical to use threads or thread pools. Example would be a background thread that could e.g. log something related to the current request (without blocking it) or sending a mail in the back end etc — Jim, Jan 12 '14 at 22:05
OK, thanks for clarifying. The choice between forking and threading depends on the OS, and on how much interaction you need between the parent and child. See: http://stackoverflow.com/questions/16354460/forking-vs-threading — Mike Baranczak, Jan 12 '14 at 22:10
No, it's not better to spawn processes than threads. In Python the global interpreter lock can be a problem in case of heavy computation, in this case forking processes gives better results, just for this particular case. In general for web apps I think it is better not to fork or spawn threads but to use async I/O and frameworks like Twisted or libraries like gevent. — mguijarr, Jan 12 '14 at 22:13
@MikeBaranczak:My question is not about multiprocess vs multithreading but about specifically web applications in Perl and Python. For Perl I have read that the threading has issues so it is avoided and in Python by searching I can find more references about forking processes than threads — Jim, Jan 12 '14 at 22:16
@Jim For what OS? Linux uses a 1:1 thread model, so threads are processes. — Elliott Frisch, Jan 12 '14 at 22:32
@ElliottFrisch:Linux. But I thought that in all a thread is lightweight process (LWP?) — Jim, Jan 12 '14 at 22:35
@Jim it depends on the OS. Generally, Linux processes are lighter then other OS's threads. Also, Perl and Python are high level languages; so they might provide their own "[perl](http://perldoc.perl.org/threads.html) threads" and "[python](http://docs.python.org/2/library/threading.html)". — Elliott Frisch, Jan 12 '14 at 22:38

score 2 · Answer 1 · edited May 23 '17 at 11:49

Perl threads are really heavy (see How do I reduce memory consumption when using many threads in Perl?). And from what I read threads in python are hampered by the global interpreter lock. Threads in Java seems to be more lightweight, but not as lightweight as OS threads in Linux.

If you want to do heavy networking in Perl you don't use threads, but event based programming like with AnyEvent or POE, similar to Python which has the Twisted framework. There are several web servers based on these frameworks. Java has also the NIO framework and even in C modern fast web server like nginx uses event based programming instead of threads or processes.

I don't know of any common web server, which forks to process a request. If they fork at all they use a pre-forking model, e.g. they fork a number of worker processes up-front (or worker threads if they use threads instead of processes) and if a new request comes in, it is handled by one of the existing workers. This is much less overhead than a fork-on-request model, which only very simple servers use. Servers with event-based processing might fork too, but usually only to make effective use of multiple CPU (e.g. one process per CPU).

With pre-forking web servers the web application usually does not fork at all, but just uses the current process. Event based web servers often only handle static content internally and fast, for the slower dynamic content they connect via interfaces like FCGI to other processes, which are often pre-forking. This saves resources, because with normal web pages most requests are for static content.

There might be still a reason to fork within a web application. This is, if you need to do some work in the background (like resizing uploaded images) while the page is already finished and the content should be sent to the user. But even in this case it scales much better to have a dedicated process/thread doing this work and only feeding it with tasks.

As for the performance of creating a thread vs. a process: the fork of a process is in Unix/Linux (but not in windows) inexpensive, because it simple clones the existing process structures and marks all shared memory pages (e.g. initially all pages) as copy-on-write. Only if the new process does work, the changed memory pages get copied (that's the expensive part). The cost for creating threads differs vastly between programming languages and operating systems and is not necessary faster than forking a new process.

Perl threads are really heavy *when used improperly*. Same goes for processes. — ikegami, Jan 13 '14 at 01:25
Creation of threads in Perl and there memory usage are much more costly then threads in other languages, because the whole interpreter incl. everything which was not explicitly shared gets cloned. Than you pay about 15% performance loss just for using a Perl compiled with support for threads. But, threads can be used in a way where these disadvantages do not matter much (like with a worker pool), as long as one is aware of them. — Steffen Ullrich, Jan 13 '14 at 05:49
@SteffenUllrich:The contents in your answer are useful and informative, but the answer (except a tiny part) is not related to my question. My question is **not** about the thread pools that handle incoming client connections (where one can use blocking approaches or NIO as you say in Java) or event based frameworks. The question is focused solely on back end tasks needed to be done **inside** the web application e.g. log data in a separate thread than the frameworks client connection thread etc — Jim, Jan 13 '14 at 09:09
I've tried to describe the bigger picture, because I found some assumptions of the question ambiguous. Now I see, that your focus is on work which need to be done, after the content was created and send to the client. If it is not too much you can just do it without using a new thread or process, but otherwise I would neither fork nor create a new thread, because in this case I could quickly run out of resources if lots of request come in. Instead I would use a worker approach, where the worker could be either an external process or an internal thread pool. — Steffen Ullrich, Jan 13 '14 at 18:39
"where the worker could be either an external process or an internal thread pool" Using an external process also feels weird. As for the internal thread pool, from the comments it seems that Perl's thread are not recommended to be used. So the standard approach of thread pools in java it is not clear to me what is the corresponding approach in scripting framework like perl — Jim, Jan 13 '14 at 19:00
@SteffenUllrich:Not weird (concerning the external process) but too much work for something that I would expect that would be needed almost always in any web application — Jim, Jan 13 '14 at 19:21

Is forking processes (instead of forking threads) preferred in scripted web applications?

1 Answers1