Overhead of multiple synchronized on the same object

Question

Consider this code:

void A() {
    synchronized (obj) {
        for (int i = 0; i < 1000; i++) {
            B();
        }
    }
}

void B() {
    synchronized (obj) {
        // Do something
    }
}

How much will be the overhead of "synchronized" in calling A? Will it be close to the overhead of only one "synchronized"?

What do you mean with "overhead"? Memory consumption? Runtime? Something else? — Korashen, Nov 15 '18 at 16:26
http://etutorials.org/Programming/Java+performance+tuning/Chapter+10.+Threading/10.4+Synchronization+Overhead/ — fantaghirocco, Nov 15 '18 at 16:27
It will be more, as synchronized is getting called 1000 times more than in the single call use case. So the VM has to put 1000 additional lock tokens on `obj`. — Korashen, Nov 15 '18 at 16:28
@Korashen Won't it understand that it has already acquired the lock? — Shayan, Nov 15 '18 at 16:29
Right now the overhead is immaterial since the code's not being invoked. We can't talk about theoretical overhead since we don't know if you're running this in a VM with 1 core or on an Epyc server with *plenty* of cores. You would be best served profiling this first before coming to us with questions, since right now, all we're going to be doing is guessing. — Makoto, Nov 15 '18 at 16:30
@Shayan, it does. And that is the reason, why another lock token can get placed. — Korashen, Nov 15 '18 at 16:31
The overhead of the lock is not changed by how many times it is used in code. The overhead is; a) the time to acquire the lock, b) the housekeeping of holding and releasing the lock. — Peter Lawrey, Nov 15 '18 at 16:31
@PeterLawrey But every call of `synchronized(obj)` aquired another lock, which therefore increased the overhead, as you also stated — Korashen, Nov 15 '18 at 16:32
@Korashen synchronizing while holding the lock is **much** lower, and there is nothing to wait for. — Peter Lawrey, Nov 15 '18 at 16:35

Fedor Losev · Answer 1 · 2018-11-15T18:00:59.047

The answer to this (legitimate) question depends on the OS, hardware and specific VM implementation.

Putting aside the cost of a function call, it may cost near to nothing on one OS/architecture (consider modern processor/OS/VM) and much more on another (consider purely software processor emulation). On a single green thread VM it may cost near to zero (except the call overhead). The cost will differ even between ARM and Intel of a comparable power.

synchronized() is usually implemented inside a VM by using OS synchronization primitives, with some heuristics to speed up common cases. OS, in turn, uses hardware instructions and heuristics to perform this task. Usually, subsequent acquisition of an already acquired synchronization primitive is exceptionally efficient in an OS and is very efficient on a typical production grade VM.

On modern Windows/Linux VM and Intel/AMD processor, usually, it doesn't cost a lot of CPU cycles (assuming otherwise idle machine) and is in the low nanoseconds range.

Note, in general, it is a very complex topic. Multiple layers of software, hardware (and the effect of other tasks running on the same hardware resources) are involved. Rigorous research of even a small sub-topic here can compose multiple Ph.D. thesis.

In practice, though, my advice is to assume the cost of a second synchronized in small loops to be zero unless you encounter a particular bottleneck (which is quite unlikely).

If there is a large number of iterations, it definitely will increase the cost vs single synchronized, and the overall effect depends on what you are doing inside the loop. Usually, there is some work in each iteration making the relative overhead negligible. But for some cases, it may prevent loop optimization and add a substantial overhead (substantial comparing to single synchronized, not as a practical measure). However, in common practical cases of huge loops, one should think about different design and avoid performing the outer synchronized to reduce lock contention.

To get a sense about VM implementation you may look, for example, into the Synchronization section of this paper. It is a bit outdated but is straightforward to understand.

Peter Lawrey · Answer 2 · 2018-11-15T16:52:29.660

0

synchronized locks are re-entrant and acquiring a lock when a thread is already holding a lock is a) the time to check it already hols the lock, b) the time to increment a counter and later decrement it.

The first one takes the longest and adds about 10 - 50 ns each time.

edited Nov 15 '18 at 16:52

answered Nov 15 '18 at 16:34

Peter Lawrey

525,659
79
751
1,130

Overhead of multiple synchronized on the same object

2 Answers2