Why is creating a Thread said to be expensive?

Question

The Java tutorials say that creating a Thread is expensive. But why exactly is it expensive? What exactly is happening when a Java Thread is created that makes its creation expensive? I'm taking the statement as true, but I'm just interested in mechanics of Thread creation in JVM.

Thread lifecycle overhead. Thread creation and teardown are not free. The actual overhead varies across platforms, but thread creation takes time, introducing latency into request processing, and requires some processing activity by the JVM and OS. If requests are frequent and lightweight, as in most server applications, creating a new thread for each request can consume significant computing resources.

From Java Concurrency in Practice
By Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, Doug Lea
Print ISBN-10: 0-321-34960-1

I don't know the context in which the tutorials you've read say this: do they imply that the creation itself is expensive, or that "creating a thread" is expensive. The difference I try to show is between the pure action of making the thread (lets call it instantiating it or something), or the fact that you have a thread (so using a thread: obviously having overhead). Which one is claimed // which one do you wish to ask about? — Nanne, Mar 30 '11 at 07:12
@typoknig - Expensive compared to NOT creating a new thread :) — willcodejavaforfood, Mar 30 '11 at 07:25
possible duplicate of [Java thread creation overhead](http://stackoverflow.com/questions/2117072/java-thread-creation-overhead) — Paul Draper, Jun 28 '14 at 23:32
threadpools for the win. no need to always to create new threads for tasks. — Alexander Mills, Oct 14 '15 at 08:48
Alternatively, the *virtual threads* feature (also known as *fibers*) coming to Java via [*Project Loom*](https://wiki.openjdk.java.net/display/loom/Main) are *not* expensive. Loom maps many virtual threads to one actual platform/host thread to greatly improve performance in situations where threads often block. For more info, see the most recent presentations and interviews by Ron Pressler of Oracle. Early access to Loom-enabled JVMs is available now. — Basil Bourque, Feb 07 '21 at 04:08

Stephen C · Accepted Answer · 2023-07-28T05:10:45.357

172

Why is creating a Thread said to be expensive?

Because it >>is<< expensive.

Java thread creation is expensive because there is a fair bit of work involved:

A large block of memory has to be allocated and initialized for the thread stack.
System calls need to be made to create / register the native thread with the host OS.
Descriptors need to be created, initialized and added to JVM-internal data structures.

It is also expensive in the sense that the thread ties down resources as long as it is alive; e.g. the thread stack, any objects reachable from the stack, the JVM thread descriptors, the OS native thread descriptors.

The costs of all of these things are platform specific, but they are not cheap on any Java platform I've ever come across.

A Google search found me an old benchmark that reports a thread creation rate of ~4000 per second on a Sun Java 1.4.1 on a 2002 vintage dual processor Xeon running 2002 vintage Linux. A more modern platform will give better numbers ... and I can't comment on the methodology ... but at least it gives a ballpark for how expensive thread creation is likely to be.

Peter Lawrey's benchmarking indicates that thread creation is significantly faster these days in absolute terms, but it is unclear how much of this is due improvements in Java and/or the OS ... or higher processor speeds. But his numbers still indicate a 150+ fold improvement if you use a thread pool versus creating/starting a new thread each time. (And he makes the point that this is all relative ...)

The above assumes native threads rather than green threads, but modern JVMs all use native threads for performance reasons. Green threads are possibly cheaper to create, but you pay for it in other areas.

Update: The OpenJDK Loom project aims to provide a light-weight alternative to standard Java threads, among other things. They are proposing virtual threads which are a hybrid of native threads and green threads. In simple terms, a virtual thread is rather like a green thread implementation that uses native threads underneath when parallel execution is required.

As of now (July 2023) Project Loom has become JEP 444. It has been in preview since Java 19, and is proposed for full release in Java 21.

I've done a bit of digging to see how a Java thread's stack really gets allocated. In the case of OpenJDK 6 on Linux, the thread stack is allocated by the call to pthread_create that creates the native thread. (The JVM does not pass pthread_create a preallocated stack.)

Then, within pthread_create the stack is allocated by a call to mmap as follows:

mmap(0, attr.__stacksize, 
     PROT_READ|PROT_WRITE|PROT_EXEC, 
     MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)

According to man mmap, the MAP_ANONYMOUS flag causes the memory to be initialized to zero.

Thus, even though it might not be essential that new Java thread stacks are zeroed (per the JVM spec), in practice (at least with OpenJDK 6 on Linux) they are zeroed.

edited Jul 28 '23 at 05:10

answered Mar 30 '11 at 07:16

Stephen C

698,415
94
811
1,216

2

@Raedwald - it is the initialization part that is expensive. Somewhere, something (e.g. the GC, or the OS) will zero the bytes before the block is turned into a thread stack. That takes physical memory cycles on typical hardware. – Stephen C Mar 31 '11 at 06:34
3

"Somewhere, something (e.g. the GC, or the OS) will zero the bytes". It will? The OS will if it requires allocation of a new memory page, for security reasons. But that will be uncommon. And the OS might keep a cache of already zero-ed pages (IIRC, Linux does so). Why would the GC bother, given that the JVM will prevent any Java program reading its content? Note that the standard C `malloc()` function, which the JVM might well use, does *not* guarantee that allocated memory is zero-ed (presumably to avoid just such performance problems). – Raedwald Apr 01 '11 at 10:12
1

http://stackoverflow.com/questions/2117072/java-thread-creation-overhead/2117127#2117127 concurs that "One major factor is the stack memory allocated to each thread". – Raedwald Apr 01 '11 at 10:17
2

@Raedwald - see updated answer for info on how the stack is actually allocated. – Stephen C Apr 09 '11 at 14:18
5

It is possible (probable even) that the memory pages allocated by the `mmap()` call are copy-on-write mapped to a zero page, so their initailisation happens not within `mmap()` itself, but when the pages are first *written* to, and then only one page at a time. That is, when the thread starts execution, with the cost bourne by the created thread rather than the creator thread. – Raedwald Apr 12 '11 at 12:41
@Raedwald, the usual behavior of the newly create threads is starting off on the same CPU even if a lot cores/CPUS (on different sockets) are available. That creates local latency at the very least, also it can starve the creation thread off CPU cycles. (there have been some heuristics to help that thread, so it can actually spawn enough and delegate work). Registering the host OS requires kernel hop(s), separate calls to mmap on current linux kernels impose extra overhead. – bestsss Jun 30 '11 at 07:47
@bestsss - that's OS / scheduler specific. @Raedwald, it is largely immaterial which thread bears the cost. The big picture is that the zeroing uses memory bandwidth, which potentially affects every thread / processor. – Stephen C Jun 30 '11 at 07:55
@Stephen, it's an example how it works on a commodity OS (I think windows is practically the same except thread priorities work in different way). Perhaps it was unclear, the threads just start on the same core and after some time they are rescheduled to a different one. – bestsss Jun 30 '11 at 08:01
@StephenC you mention "Descriptors needs to be created, initialized and added to JVM internal data structures." what does this mean? file descriptors? and what does internal data structures refer to? are you basically saying that now the OS creates certain object locks? – David T. Dec 04 '13 at 20:18
@DavidT. Private descriptors in the JVM (user-space) that (for example) hold the native thread identifier corresponding to a Thread. (Check the OpenJDK codebase ...). Also there are native thread descriptors in kernel space. No I'm not referring to object locks. (The object lock descriptors typically only get created when an object lock is used. That's independent from this.) – Stephen C Dec 05 '13 at 02:46

score 91 · Answer 2 · edited Sep 28 '11 at 07:33

Others have discussed where the costs of threading come from. This answer covers why creating a thread is not that expensive compared to many operations, but relatively expensive compared to task execution alternatives, which are relatively less expensive.

The most obvious alternative to running a task in another thread is to run the task in the same thread. This is difficult to grasp for those assuming that more threads are always better. The logic is that if the overhead of adding the task to another thread is greater than the time you save, it can be faster to perform the task in the current thread.

Another alternative is to use a thread pool. A thread pool can be more efficient for two reasons. 1) it reuses threads already created. 2) you can tune/control the number of threads to ensure you have optimal performance.

The following program prints....

Time for a task to complete in a new Thread 71.3 us
Time for a task to complete in a thread pool 0.39 us
Time for a task to complete in the same thread 0.08 us
Time for a task to complete in a new Thread 65.4 us
Time for a task to complete in a thread pool 0.37 us
Time for a task to complete in the same thread 0.08 us
Time for a task to complete in a new Thread 61.4 us
Time for a task to complete in a thread pool 0.38 us
Time for a task to complete in the same thread 0.08 us

This is a test for a trivial task which exposes the overhead of each threading option. (This test task is the sort of task that is actually best performed in the current thread.)

final BlockingQueue<Integer> queue = new LinkedBlockingQueue<Integer>();
Runnable task = new Runnable() {
    @Override
    public void run() {
        queue.add(1);
    }
};

for (int t = 0; t < 3; t++) {
    {
        long start = System.nanoTime();
        int runs = 20000;
        for (int i = 0; i < runs; i++)
            new Thread(task).start();
        for (int i = 0; i < runs; i++)
            queue.take();
        long time = System.nanoTime() - start;
        System.out.printf("Time for a task to complete in a new Thread %.1f us%n", time / runs / 1000.0);
    }
    {
        int threads = Runtime.getRuntime().availableProcessors();
        ExecutorService es = Executors.newFixedThreadPool(threads);
        long start = System.nanoTime();
        int runs = 200000;
        for (int i = 0; i < runs; i++)
            es.execute(task);
        for (int i = 0; i < runs; i++)
            queue.take();
        long time = System.nanoTime() - start;
        System.out.printf("Time for a task to complete in a thread pool %.2f us%n", time / runs / 1000.0);
        es.shutdown();
    }
    {
        long start = System.nanoTime();
        int runs = 200000;
        for (int i = 0; i < runs; i++)
            task.run();
        for (int i = 0; i < runs; i++)
            queue.take();
        long time = System.nanoTime() - start;
        System.out.printf("Time for a task to complete in the same thread %.2f us%n", time / runs / 1000.0);
    }
}
}

As you can see, creating a new thread only costs ~70 µs. This could be considered trivial in many, if not most, use cases. Relatively speaking it is more expensive than the alternatives and for some situations a thread pool or not using threads at all is a better solution.

That's a great piece of code there. Concise, to the point and clearly displays its jist. — Nicholas, Feb 07 '13 at 20:33
In the last block, I believe the result is skewed, because in the first two blocks the main thread is removing in parallel as the worker threads are putting. However in the last block, the action of taking is all performed serially, so it is dilating the value. You could probably use queue.clear() and use a CountDownLatch instead to wait for the threads to complete. — Victor Grazi, Sep 09 '13 at 03:20
@VictorGrazi I am assuming you want to collect the results centrally. It is doing the same amount of queuing work in each case. A count down latch would be slightly faster. — Peter Lawrey, Sep 09 '13 at 07:43
Actually, why not just have it do something consistently fast, like incrementing a counter; drop the whole BlockingQueue thing. Check the counter at the end to prevent the compiler from optimizing out the increment operation — Victor Grazi, Sep 10 '13 at 01:52
@grazi you could do that in this case but you wouldn't in most realistic cases as waiting on a counter might be inefficient. If you did that the difference between the examples would be even greater. — Peter Lawrey, Sep 10 '13 at 06:11
I believe this is a bad example. The worse performance has most likely to do with fighting over the blocking queue, and not at all with the thread creation cost/overhead. — dagnelies, Nov 20 '17 at 16:06

Michael Borgwardt · Answer 3 · 2015-10-14T09:16:30.933

33

In theory, this depends on the JVM. In practice, every thread has a relatively large amount of stack memory (256 KB per default, I think). Additionally, threads are implemented as OS threads, so creating them involves an OS call, i.e. a context switch.

Do realize that "expensive" in computing is always very relative. Thread creation is very expensive relative to the creation of most objects, but not very expensive relative to a random harddisk seek. You don't have to avoid creating threads at all costs, but creating hundreds of them per second is not a smart move. In most cases, if your design calls for lots of threads, you should use a limited-size thread pool.

edited Oct 14 '15 at 09:16

answered Mar 30 '11 at 07:16

Michael Borgwardt

342,105
78
482
720

9

Btw kb = kilo-bit , kB = kilo byte. Gb = giga bit , GB = giga byte. – Peter Lawrey Sep 10 '13 at 06:14
@PeterLawrey do we capitalize the 'k' in 'kb' and 'kB', so there's symmetry to 'Gb' and 'GB' ? These things bug me. – Jack Jun 08 '16 at 19:49
3

@Jack There is a `K` = 1024 and `k` = 1000. ;) https://en.wikipedia.org/wiki/Kibibyte – Peter Lawrey Jun 10 '16 at 12:04

score 11 · Answer 4 · answered Mar 30 '11 at 07:17

There are two kinds of threads:

Proper threads: these are abstractions around the underlying operating system's threading facilities. Thread creation is, therefore, as expensive as the system's -- there's always an overhead.
"Green" threads: created and scheduled by the JVM, these are cheaper, but no proper paralellism occurs. These behave like threads, but are executed within the JVM thread in the OS. They are not often used, to my knowledge.

The biggest factor I can think of in the thread-creation overhead, is the stack-size you have defined for your threads. Thread stack-size can be passed as a parameter when running the VM.

Other than that, thread creation is mostly OS-dependent, and even VM-implementation-dependent.

Now, let me point something out: creating threads is expensive if you're planning on firing 2000 threads per second, every second of your runtime. The JVM is not designed to handle that. If you'll have a couple of stable workers that won't be fired and killed over and over, relax.

*"... a couple of stable workers that won't be fired and killed ..."* Why did I start thinking about workplace conditions? :-) — Stephen C, Mar 30 '11 at 07:49

score 6 · Answer 5 · edited Sep 13 '16 at 10:17

6

Creating Threads requires allocating a fair amount of memory since it has to make not one, but two new stacks (one for java code, one for native code). Use of Executors/Thread Pools can avoid the overhead, by reusing threads for multiple tasks for Executor.

edited Sep 13 '16 at 10:17

Ravindra babu

37,698
11
250
211

answered Mar 30 '11 at 07:16

Philip JF

28,199
5
70
77

@Raedwald, what is the jvm that uses separate stacks? – bestsss Jun 30 '11 at 12:46
As far as I know, all JVMs allocate two stacks per thread. It is helpful for garbage collection to treat Java code (even when JITed) differently from free-casting c. – Philip JF Jul 02 '11 at 05:09
@Philip JF Can you please elaborate? What do you mean by 2 stacks one for Java code and one for native code? What does it do? – Gurinder Apr 28 '18 at 18:52
*"As far as I know, all JVMs allocate two stacks per thread."* - I have never seen any evidence that would support this. Perhaps you are misunderstanding the true nature of the opstack in the JVM spec. (It is a way of modelling the behavior of bytecodes, not something that needs to be used at runtime to execute them.) – Stephen C Nov 10 '19 at 00:32

score 2 · Answer 6 · answered Mar 30 '11 at 07:18

Obviously the crux of the question is what does 'expensive' mean.

A thread needs to create a stack and initialize the stack based on the run method.

It needs to set up control status structures, ie, what state it's in runnable, waiting etc.

There's probably a good deal of synchronization around setting these things up.

Why is creating a Thread said to be expensive?

6 Answers6

Linked

Related