Is it possible: yes kinda, Linux kernel is a living example.
In the linux kernel, when RCU is used, garbage collection of the previous version of the data structure happens during schedule()
because at that point it is known that all readers have completed.
Of course, Linux kernel does not have a garbage collector, and reclamation of unreachable memory is generally explicit and immediate. The RCU update is a special case, where reclamation is explicit but not immediate.
Is it possible for a general-purpose vm like Python of JavaScript: it will be hard.
- RCU needs a garbage-collector
- RCU is made for read-mostly workloads
- RCU is made for short critical sections
RCU still needs a garbage collector; rather RCU working together with garbage collector avoids locking most of the time, that is when a read critical section completes without concurrent write.
Read-mostly workloads. Reference counting is particularly write-heavy, so much so that multithreaded Python VM has GIL to prevent concurrent refcount updates, because those would incur cache synchronisation penalty. Thus, some other technique of garbage collection is required.
Meanwhile a naive JavaScript implementation doesn't need synchronisation at all, as it's single-threaded (although it's possible to imagine a JavaScript implementation where garbage collection is offloaded to a separate thread).
The length of critical section in a dynamic language VM is particularly hard to predict, because of incessant indirection. For example, consider int(code.replace(" ", ""))
: int
may be overloaded via __int__
, .replace
may be overloaded through a property, (...)
may be overloaded via __call__
. Each overload is Python code that could take arbitrarily long. Same applies to built-in data structures, where update (last statement) of c=1; d={c:42}; d[c]=43
could internally use RCU for something, except it must be very careful because c
might just implement __hash__
which may take arbitrarily long.
I'm afraid I don't know enough about compiled languages and their VMs.
My gut feeling that novel, high-performance garbage collectors could indeed use RCU internally, and then perhaps expose RCU to the implementation of the built-in data structures. I think that OS may be required to provide better API to pin execution to specific cores to benefit from local cache and/or to run custom code when user-land is preempted.
While this is not a full answer, I hope this extended commentary helps to circumscribe the original question.