Regarding the usage of lock records in the implementation of thin-locks in HotSpot

Question

I'm having some trouble understanding what the lifecycle of lock records is when dealing with "thin-locks" on HotSpot.

My understanding is that:

When a thread T first attempts to acquire a lock on object o, it triggers a "thin lock" creation -- a lock record is created on T's stack, on the current frame F, and a copy of the mark work (that will now be referred as a displaced header) plus a reference to o is stored on F. Through a CAS operation o's header is made to reference the lock record (and the last two bits are set to 00 to mark this object as thin-locked!).

There are multiple reasons why the CAS operation could fail, though:

Another thread was quicker to grab the lock, we'll need to turn this thin-lock into a full-blown monitor instead;

The CAS failed but it can be seen that the reference to the lock record belongs to Ts stack, so we must be attempting to re-enter the same lock, which is fine. In that case, the lock record of the current stack-frame is kept null.

Given this, I have a couple of questions:

Why would we create a new lock record each time we attempt to enter a lock? Wouldn't it be preferable to just keep a single lock record for each object o?
When leaving a synchronized block, I failed to understand how can the VM know whether we should release the lock or whether we're still "unwinding" from a recursive lock.

Anyone could shed some light on this?

References

Let me quote a paragraph from the last link:

Whenever an object is lightweight locked by a monitorenter bytecode, a lock record is either implicitly or explicitly allocated on the stack of the thread performing the lock acquisition operation. The lock record holds the original value of the object’s mark word and also contains metadata necessary to identify which object is locked. During lock acquisition, the mark word is copied into the lock record (such a copy is called a displaced mark word), and an atomic compare-and-swap (CAS) operation is performed to attempt to make the object’s mark word point to the lock record. If the CAS succeeds, the current thread owns the lock. If it fails, because some other thread acquired the lock, a slow path is taken in which the lock is inflated, during which operation an OS mutex and condition variable are associated with the object. During the inflation process, the object’s mark word is updated with a CAS to point to a data structure containing pointers to the mutex and condition variable. During an unlock operation, an attempt is made to CAS the mark word, which should still point to the lock record, with the displaced mark word stored in the lock record. If the CAS succeeds, there was no contention for the monitor and lightweight locking remains in effect. If it fails, the lock was contended while it was held and a slow path is taken to properly release the lock and notify other threads waiting to acquire the lock. Recursive locking is handled in a straightforward fashion. If during lightweight lock acquisition it is determined that the current thread already owns the lock by virtue of the object’s mark word pointing into its stack, a zero is stored into the on-stack lock record rather than the current value of the object’s mark word. If zero is seen in a lock record during an unlock operation, the object is known to be recursively locked by the current thread and no update of the object’s mark word occurs. The number of such lock records implicitly records the monitor recursion count. This is a significant property to the best of our knowledge not attained by most other JVMs.

Thanks

Is there a much larger doc that you're referencing? What is it? — boneill, Aug 13 '21 at 02:33
@boneill: Updated the references. They're big texts, but it is what it is.. — devoured elysium, Aug 13 '21 at 02:44

apangin · Answer 1 · 2021-08-16T09:36:36.020

Why would we create a new lock record each time we attempt to enter a lock? Wouldn't it be preferable to just keep a single lock record for each object o?

Seems like you've missed the main point of lock records. Lock record is not some per object entity, but rather per lock site. If, for example, a method has 3 synchronized blocks, its stack frame may have up to 3 lock records, no matter if it will be 3 different locked objects, or the same object recursively locked 3 times.

Lock records (actually, they are not called so in HotSpot sources; they are usually referred to as a "monitor", "monitor slot", "monitors block", etc.) help to maintain the mapping between a stack frame and its locked monitors. In particular, when a stack frame is removed due to an exception, all locks need to be automatically released. So, think of the monitor slots as something like local variable slots, which can hold references to the same or different objects. Like local variables, monitors are associated with a given stack frame. They hold references to the locked objects, but they are not "locks" themselves.

When leaving a synchronized block, I failed to understand how can the VM know whether we should release the lock or whether we're still "unwinding" from a recursive lock.

A lock record (a monitor slot) holds two things: a reference to the locked object and a so called "displaced header". Displaced header is either a previous (unlocked) value of the object header, or zero, if it was a recursive lock.

As I explained above, if we lock an object 3 times, there will be 3 lock records. Only the first one holds the actual non-zero displaced header, other two will have zeros. This means, first two monitorexit instructions will pop lock records with zeros, realize that it is a recursive lock, and thus will not update the object. When the last lock record is removed, the JVM sees a non-zero value in the displaced header, and stores it back into the real object header, thus marking it unlocked.

That’s true for ordinary Java code. On the bytecode level, you could execute a `monitorenter` instruction more than one time without have executed `monitorexit` yet, getting more lock records than `monitorenter` instructions. On the other hand, using those records to enforce “structured locking” is not necessary for ordinary Java code, as it will release the monitors even in the exceptional case, using exception handlers. Not doing so is considered an error, the same way as returning without releasing them That’s why the JVM will also generate an exception when it had to release dangling locks — Holger, Aug 16 '21 at 08:27
@Holger `synchronized void m() { throw new Error(); }` is an ordinary Java code, it doesn't have exception handlers, but the JVM still releases the lock. How do you think it does so, if not using lock records? Not even saying about `PopFrame`/`ForceEarlyReturn` features. — apangin, Aug 16 '21 at 09:12
I was talking about `synchronized(…) {}` blocks and *ordinary Java code*, which does not include `PopFrame`/`ForceEarlyReturn` features. A `synchronized void m() { }` declaration is an entirely different beast; it does not even have associated `monitorenter`/`monitorexit` instructions. Further, it causes exactly one lock operation per invocation and surely could be implemented without supporting pushing an arbitrary number of records to the stack frame. — Holger, Aug 16 '21 at 09:27

boneill · Answer 2 · 2021-08-13T03:07:52.883

0

Note that this is my original attempt to answer the question. It's quite clear to me that the docs linked above answer everything on their own.

As for the first question, the new lock record is created on the stack because it's much cheaper than allocating it on the heap. In many cases, a monitor is never contended, and so this can be a huge win. Stack alloc/free can sometimes be so cheap that it's not even worth considering.

The second question can be answered by noticing that the doc refers to the current frame F. There's always a frame pointer register, and so the monitorexit instruction can simply check if the current frame pointer matches the address of the thin lock record. If so, then it knows it's the last one out.

A key aspect is that the monitorenter/exit instructions must be properly balanced, and the JVM tries to prove this. Otherwise, a ref count would be needed to detect when the last monitorexit instruction was reached. It seems that instead, HotSpot just doesn't bother compiling the code or optimize monitor acquisition if the monitorenter/exit instructions aren't balanced.

edited Aug 13 '21 at 03:07

answered Aug 13 '21 at 02:49

boneill

1,478
1
11
18

Regarding 1, I think you're not answering my question: the question is not why the lock record is stored on the stack vs the heap, but why lock re-entrancy means creating one new lock record per re-entrancy. Calling a recursive function 5 times that needs to go through the same synchronization block locking the same variable `o` will create 5 lock records. – devoured elysium Aug 13 '21 at 02:53
@devouredelysium where did you get that re-entrancy will create a new lock record though? – Eugene Aug 13 '21 at 02:55
@Eugene: " During recursive acquires, the thread will notice that it already owns the lock by checking that the pointer is pointing somewhere into the thread’s stack. The lock record is in this case set to NULL (0), indicating that it is recursive. Unlock attempts will check the lock record, and if it is NULL the release operation simply returns, as the lock is still held recursively.", last link, but I think in one way or another that's restated in all references. – devoured elysium Aug 13 '21 at 02:56
@boneill: regarding your 2nd point, how would that work when you have something such as `void m() { synchronized(o) { synchronized(o) {} } }`? If all you're doing is looking at the stack frame to know whether you need to unlock a given object `o` when facing a `monitorexit`, you'd be unlocking it way too soon in this example. – devoured elysium Aug 13 '21 at 02:58
@devoured elysium: There's also the stack pointer, which will change as a result of the initial allocation. So perhaps that's compared instead of the frame pointer? Also, the doc discusses lock coarsening. The double synchronized blocks get turned into one. – boneill Aug 13 '21 at 03:01
@boneill eh no, this is not about coarsening, as that can be only done by JIT and thin locks can happen much sooner. – Eugene Aug 13 '21 at 03:03
I updated the OP with one more link and some text. – devoured elysium Aug 13 '21 at 03:03
@eugene the first doc states that HotSpot doesn't have thin locks at all, and that all the optimizations are done in the JIT. – boneill Aug 13 '21 at 03:04
I (consciously) used the wrong term -- thin-locks -- as that's the term that for all effects everyone uses, although technically not correct. Dave's writing can be quite chaotic, but he doesn't mean that a "lightweight" mechanism for locking is not used, just that thin-locks, as defined in the original paper by David Bacon, are not the approach followed by HotSpot. – devoured elysium Aug 13 '21 at 03:09
"For synchronization, all the mark word encodings in HotSpot are **pointer-based**." vs "Thin-locks were devised by David Bacon at IBM research. Briefly, a thin lock is **value-based** and encodes an owner thread-id, recursion count, etc., into a single object header word. " – devoured elysium Aug 13 '21 at 03:16
2

As elaborated in [this answer](https://stackoverflow.com/a/54909060/2711488) there is a direct relationship between number of nested monitor acquisition and stack size, which indicates that indeed, the HotSpot JVM does create a new record on the stack each time. There’s a simple reason to do this, creating such a record is cheaper than checking whether there is already a record on the stack and in normal applications, the nest count is not so high that it matters. – Holger Aug 13 '21 at 07:38

Regarding the usage of lock records in the implementation of thin-locks in HotSpot

References

2 Answers2