While trying to understand how Java's VarHandle "getAndSet" worked as opposed to it's JDK7 version:
public final V getAndSet(V newValue) {
while (true) {
V x = get();
if (compareAndSet(x, newValue))
return x;
}
}
I noticed something...
The form of backpressure handling done by JDK7 is not sequentially consistent, if a thread fails the compareAndSet
, and retries, it may lose it's position to some thread that came later.
Even if this operation is deemed "atomic" an implicit queueing buffer is built on it's spin... and this buffer is not sequentially consistent, even if the atomicity is performed by the comapreAndSet
action.
I am not entirely sure why JDK7 implements the test "compare"(in the comapreAndSet) to perform a unique swapping action... since when a person performs the physical action of swapping something, that person is not interested in comparing anything...
the compareAndSet
employs the cmpxchg instruction, so... my immediate question was... why not use just the xchg instruction for the getAndSet
?... by eliminating the test failure, a single buffering queue will now build up... held by the native LOCK
.
And so, while I am not sure if VarHandle DOES in fact use the xchg
alone (DOES IT?), it still remains the question whether the machine is handling the backpressure build-up in a sequentially consistent manner...
(Here comes the essay... sorry.)
(edit:
At first I thought that with mfence
link one could achieve this FIFO queue... but in reality there is no way for the processor to enforce any type of policy that defines such a FIFO queueing of "assigned scheduled instructions", instead the processor's scheduler: "may use different algorithms and policies to determine the order and priority of the threads, such as round-robin, shortest job first, priority-based, etc. Some of these algorithms may approximate a FIFO order, but they are not strictly enforced by the processor")
And there is no evidence that the lock prefix is doing so.
In fact the architecture may...:
The queuing mechanism used by memory controllers is typically managed at the hardware level and is influenced by the overall design of the memory subsystem, including cache hierarchies, memory interconnects, and bus arbitration protocols.
Instructions in modern processors are designed to perform specific computations and memory operations, rather than directly control low-level hardware details like queuing mechanisms in memory controllers. Processor instructions are more concerned with specifying the operations to be performed on data, such as arithmetic operations, memory loads and stores, control flow instructions, etc.
Please don't ask me who I'm quoting... I think you know "what" said this...
... On reactive systems... Or...[ Why NO native FIFO instruction queue? ]
It seems reasonable that the greater concern for the processor would be for "specifying the operations to be performed on data"(... what one would call "updates"), instead of defining queueing strategies.
The thing about updates is that an update will always relay the previous state, and so... the VERSION is implicit by the "form" of the current data... so if... let's say the operand says that the update is a i -> i++;
we can surely assume that the previous version was current - 1;
, this may imply that defining a queueing strategy is redundant since the end result will still apply all changes performed, and a version can be implied by each change simply by checking its current state.
But this is not how Version Based systems work (Publisher architectures aka: "Displays")... In Version based systems historical sequential consistency is the most important factor, especially since changes may completely disregard/override previous states leaving no way to figure out previous states... The output does NOT HAVE MEMORY
And such is the case of displaying events, where previous states are completely disregarded.
This is why I believe most reactive systems are doing too much when used in such menial tasks like web applications or mobile applications displays.
Their buffering management is seldom used, only in such specific cases like video streaming is it really needed, but if we get pedantic about meaning... even video streaming is somewhat limited by the speed our eyes process information, so there is a limit to the number of processes that need buffering (assuming good connection which I know is the main/real reason for this buffer).
Defining a queuing mechanism on the lock prefix would be a difference similar to the dynamic between "volatile" and "syncrhonized" keywords.
Where non-FIFO locks would be used in update actions where memory management would prioritize performance with reordering, thread priority, etc.. used on databases and streaming operations... where no calls are left behind.
While strict FIFO locks would just queue in order of arrival, used in publishing operations like displays, where dropping instructions is always an alternative to relieve backpressure.
So, one pipeline for writing/input (Stateful) and the other for reading/output (Stateless).