27

I'm trying to nut out the details for a true WeakKeyedDictionary<,> for C#... but I'm running into difficulties.

I realise this is a non-trivial task, but the seeming inability to declare a WeakKeyedKeyValuePair<,> (where the GC only follows the value reference if the key is reachable) makes it seemingly impossible.

There are two main problems I see:

  1. Every implementation I've so far seen does not trim values after keys have been collected. Think about that - one of the main reasons for using such a Dictionary is to prevent those values being kept around (not just the keys!) as they're unreachable, but here they are left pointed to by strong references.

    Yes, add/remove from the Dictionary enough and they'll eventually be replaced, but what if you don't?

  2. Without a hypothetical WeakKeyedKeyValuePair<,> (or another means of telling the GC to only mark the value if the key is reachable) any value that refers to it's key would never be collected. This is a problem when storing arbitrary values.

Problem 1 could be tackled in a fairly non-ideal/hackish way : use GC Notifications to wait for a full GC to complete, and then go along and prune the dictionary in another thread. This one I'm semi-ok with.

But problem 2 has me stumped. I realise this is easily countered by a "so don't do that", but it has me wondering - is this problem even possible to solve?

Community
  • 1
  • 1
Mania
  • 1,718
  • 14
  • 16

1 Answers1

33

Have a look at the ConditionalWeakTable<TKey, TValue> Class.

Enables compilers to dynamically attach object fields to managed objects.

It's essentially a dictionary where both the key and the value are a WeakReference, and the value is kept alive as long as the key is alive.

Note! This class does not use GetHashCode and Equals to do equality comparisons, it uses ReferenceEquals.

larsmoa
  • 12,604
  • 8
  • 62
  • 85
dtb
  • 213,145
  • 36
  • 401
  • 431
  • Nice find, most intriguing! How is it implemented I wonder? Without knowing this the problem becomes "is it possible to create a truly WeakValuedDictionary<,>". I'll dig in to reflector and see if I can figure it out... – Mania Dec 09 '11 at 04:54
  • 3
    What a shame, it appears it depends on an internal .NET-magic "DependentHandle" for it's implementation.. additionally it ignores .GetHashCode and .Equals, making it a substandard Dictionary at best :(. Additionally without access to DependentHandle, the problem has now moved to defining a WeakValuedDictionary<,>. I suppose this may be as close as we can get.. – Mania Dec 09 '11 at 05:05
  • 9
    @Mania The DependentHandle is the CLR implementation of [ephemerons](http://en.wikipedia.org/wiki/Ephemeron) which is impossible to implement without GC cooperation. It should have been made public like GCHandle. If you don't mind reflection trickery you can turn the static CLR methods into delegates and implement your own DependentHandle and WeakValuedDictionary. Be careful to study the .NET reference source (which is public), or use some decompiler, because it has tricky race conditions. – Zarat Apr 08 '12 at 20:57
  • @Zarat: If one wants to create a single "life-dependency-link" between two objects X and Y (such that the link will be considered as a strong reference of X only as long as a strong reference to Y exists), is there a nicer way to do that than creating a single-item `ConditionalWeakTable`? – supercat Jul 09 '12 at 19:43
  • @supercat: Yes, that's basically what ephemerons and the DependentHandle does, just create an instance with X and Y as parameters, however whether it is a 'nice' way depends on your opinion on reflection into the .NET implementation. (Of course thats not allowed in partially trusted applications.) – Zarat Jul 21 '12 at 18:25
  • @Zarat: Are you saying that `DependentHandle` is the native implementation of an ephemeron, trusted code might gain efficiency by using that, but untrusted code should just create a `ConditionalWeakTable` with one or two items in it (the desired linkage, plus possibly a linkage from the upstream side of the desired linkage table itself)? Also, I was curious what is guaranteed with respect to the relative behavior of `ConditionalWeakTable` and "fReachable" objects? Will objects which are freachable (but not strongly reachable) on one side of a `ConditionalWeakTable` be likewise on the other? – supercat Jul 23 '12 at 15:12
  • @supercat: Yes that's what I meant. Not sure what you mean with 'fReachable', to me ephemerons are just a way to tell the GC that an object is conditionally reachable: IF A is (strongly) reachable THEN B must also be marked (strongly) reachable. [I hope I didn't get that wrong, it's a while ago since I looked at that stuff.] – Zarat Jul 25 '12 at 10:28
  • @Zarat: An object is freachable if no strongly rooted references to it exist but it either has a registered finalizer, or it is reachable from an object with a registered finalizer. If an object is freachable, then in the next GC cycle it will not be eligible for collection, but if it has a finalizer it will be placed in the "freachable queue", which is a strongly-rooted list of objects whose finalizers should be run at earliest convenience. One of the constructors for `WeakReference` contains a `bool` parameter which indicates whether the `WeakReference` should be invalidated when... – supercat Jul 25 '12 at 15:18
  • ...its target becomes freachable, or only when its target ceases to exist. Generally, once an object becomes freachable, one should endeavor to replace it rather than reuse it, so having a `WeakReference` become invalid at that point is desirable. On the other hand, situations might arise where one needs to expedite the cleanup of an freachable object (e.g. because one wishes to create a new object using the same hardware resource). For that, one would use a "long" `WeakReference`. It would be interesting to know which behaviors ephemerons support. – supercat Jul 25 '12 at 15:28
  • @supercat: Ah, I see, just didn't know this under the term 'freachable'. A quick test shows that DependentHandle and ConditionalWeakTable are implemented as long weak references, they are not reset during the first collection pass. (My test: resurrect the object from the finalizer, it's still accessible in ConditionalWeakTable and long weak references, but the short weak reference has been cleared.) – Zarat Jul 26 '12 at 13:53
  • @Zarat: Which object were you resurrecting--the 'origin' or 'target' side of the link? Also, what sort of reference existed to the `ConditionalWeakTable` itself? The corner cases I'm interested in would be things like what happens if table `Tab` holds links from `A` to `Tab` and `B` to `C` (and there are no other references to the table). If `A` becomes freachable, but `B` is strongly reachable, what is the status of `C`? If a `WeakReference` becomes freachable, its target will be invalidated regardless of whether it's a long or short reference, but I don't know about a CWT. – supercat Jul 26 '12 at 15:02
  • 2
    @Zarat: BTW, one thing I'd like to see as a framework class would be an `InternTable` (with an `IEqualityComparer` as a construction parameter). Given an instance of `T`, determine whether the table already holds an entry which would compare equal. If so, compare that instance; otherwise, store the supplied instance in the table and return it. Such a table, if implemented well, could greatly improve the performance of nested immutable types where hash codes can pretty reliably distinguish unequal instances, but where comparing equal instances is expensive. – supercat Jul 26 '12 at 15:05
  • @Zarat: Using the `InternTable` would ensure that references to instances with equal contents would be replaced with references to the same instances. If one wishes to perform many comparisons on things like trees and subtrees which are built separately but will have many equal parts, such replacements may lead to tremendous efficiency improvements. Note that there is no benefit to keeping an object interned after all other references have disappeared, even when equivalent objects exist. – supercat Jul 26 '12 at 15:12
  • @supercat: (1) I resurrected the key. (2) The reachability of the table is irrelevant, except if it gets finalized it is cleared, otherwise it just acts as a list. (3) A weak reference gets cleared when freachable because its finalizer is run. Here the finalizer is on the table so it has no effect on ephemerons unless the table is collected and cleared. (4) InternTable can probably be implemented with a CWT, but more efficiently if you use reflection and ephemerons directly. – Zarat Aug 02 '12 at 22:58
  • 1
    @supercat: Thinking we're getting a bit off topic here :) If you want to discuss further you can do by mail, or on my blog. [I posted my reflection trickery](http://blog.gx.weltkante.de/2012/08/ephemerons-in-net-and-c.html) to reimplement DependentHandle and a new Ephemeron class analogue to what WeakReference is to GCHandle. – Zarat Aug 02 '12 at 23:00
  • I posted a question awhile ago about an intern table at http://stackoverflow.com/questions/10923682/any-weak-interning-collections-for-immutable-objects so if you'd like to discuss anything there, I'd like more thoughts on the subject. I would think that a weak-interning collection could greatly enhance the value of many immutable types. Comparing two trees which contain the same items but were built separately, for example, takes time O(n) every time they're compared. If the trees implement a good GetHashCode() function, however, and if code which builds trees interns them... – supercat Aug 03 '12 at 15:25
  • ...(including subtrees), then the time for comparison would be O(1) [since each tree would not just hold identical subtrees, but *the same* subtrees]. I'm not sure what the most efficient way to build a WeakInterning class would be, however, and so would welcome your thoughts on that other post. – supercat Aug 03 '12 at 15:31
  • Perhaps we could chat about such things? – supercat Aug 03 '12 at 15:50
  • 1
    I've been following this discussion with interest. I suggest you continue the discussion somewhere else and post a self-answered question with your findings. Please provide a link here so I can continue to follow this topic. Thanks! – dtb Aug 03 '12 at 16:58
  • 1
    If identity comparison cannot be used then ConditionalWeakTable is not an option. In this case I would suggest our implementation [WeakTable.cs](http://nesterovsky-bros.com/download/WeakTable.cs.txt), and our description in the blog [WeakTable](http://www.nesterovsky-bros.com/weblog/2014/01/08/WeakTable.aspx). – Vladimir Nesterovsky Jan 09 '14 at 06:25
  • @VladimirNesterovsky I have checked your code and you access concurrent dictionary in a finalizer. This exposes you to a rare race condition that can result in a deadlock. – Paya Mar 19 '15 at 18:30
  • @Paya I'm not sure you're correct in your reasoning. Finalizer is called inside dedicated thread (it's sometimes called FinalizerWorker). This thread is not different from any other thread, and does not prevent any other thread to progress. I guess you have confused FinalizerWorker thread with Garbage Collector thread that stops all other threads. If one would be able to lock something in GC thread then it could lead to dead lock. – Vladimir Nesterovsky Mar 20 '15 at 21:04
  • @VladimirNesterovsky If you take a look at [ConcurrentDictionary.TryRemove](http://referencesource.microsoft.com/#mscorlib/system/Collections/Concurrent/ConcurrentDictionary.cs,431), you will see it uses `lock`. Let's say you have only single thread in your app, and you try to remove something from WeakTable. GC kicks in while you are inside that `TryRemove` lock. In your destructor, you block on that lock, and since your main thread is suspended, it will never unlock and block forever. – Paya Mar 20 '15 at 22:30
  • @Paya my point is that the FinilizerWorker and GC are two different threads. Nothing is stopped or blocked while FinilizerWorker is working. FinilizerWorker is a thread where all Finalizers are queued to execute by GC thread. In contrast GC thead may decide to "stop the world" to trace graph of objects for the collection. – Vladimir Nesterovsky Mar 20 '15 at 22:51
  • @VladimirNesterovsky It seems you are correct, although the internet is full of people screaming to never `lock` in a finalizer. I guess you can't lock onto anything to cause GC thread (not finalizer thread) deadlock, can you? – Paya Mar 20 '15 at 23:46
  • @Paya That's my understanding. GC thread is implemented by the runtime environment, and no user code is run in its context. In fact it can be possible that there is no dedicated GC thread at all, as runtime can peek anyone, and stop all others. – Vladimir Nesterovsky Mar 20 '15 at 23:58
  • @VladimirNesterovsky I feel like I found another race condition - please let me know what you think: I call `Remove("asdf")` on thread 1, and the thread gets just before the line `states.Remove(state.key);` where it gets raced by thread 2, whcih calls `GetOrAdd("asdf")`. Because thread 1 did not remove the key from `ConditionalWeakTable` yet, and thread 2 will attempt to bind another object to the same key using `ConditionalWeakTable`, you will get an exception. – Paya Mar 21 '15 at 18:36
  • @Paya, race condition is not an evil, the only problem is to keep invariants. When thread 1 gets to the line states.Remove(state.key) the state.initialized == 2 (parked). Thread 2 in GetOrAdd will either a) find and return old value; b) return a new value. If a new state is uninitialized (state.initialized == 0), then it's marked as initialized (state.initialized == 1) and added to states. If it's observed that the state is been parked (state.initialized == 2) after the states.Add(key, state) then state is removed from states. In your case GetOrAdd will return the old value. – Vladimir Nesterovsky Mar 21 '15 at 20:47
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/73496/discussion-between-vladimir-nesterovsky-and-paya). – Vladimir Nesterovsky Mar 21 '15 at 21:33