Best way to keep track of objects when writing a debugger

Question

Background:

I am writing a debugger for an interpreted language in C#. I only write a "debugger server" handles the language-specific tasks, such as listing local variables. The result is then sent to a "debugger client" (a text editor that displays the listed variables) such as VS Code for example.

The debugger uses the Language Service Protocol by Microsoft, so it needs a specific format. In order to properly display complex data types (objects, arrays) in the debugger window, every object needs to have assigned a variableReference, which is a unique integer ID assigned to that object.

When the user clicks on an object in order to "unfold" it in the "Variables" display, a request is send to the debugger server with the variable reference, and a response is sent back to the client with the values inside that object.

My progress so far:

In order to identify which object corresponds to which ID, I have created a bi-directional map between objects that the debugger keeps track of, and their IDs (implemented as 2 dictionaries).

When I see an object in the debugger, I try to look it up in this map to get it's ID, if it is present. If it's not, I assign it a new ID, and save it into the map for later. When the frontend asks for values in an object (by user clicking on it), I look up the object in the map by it's ID and resolve the request.

The problem:

How to save the objects in the map, so I can look up an existing's object ID, or easily tell if an object is not yet registered?

I have tried using a hashmap (dictionary), where hash is computed from the object's address, and equality is implemented as reference equality. Note that computing the hash from the object content is not possible, since the contents of the object can change during debugging, so the object would not be found in the dictionary after it's hash has changed.

I need a hash that would stay the same, even when contents of the object change. Object address seems perfect for that, however I can't find a solution that works reliably, for example, this:

public int GetHashCode(object obj)
{
    GCHandle gch = GCHandle.Alloc(obj, GCHandleType.Pinned);
    IntPtr ptr = gch.AddrOfPinnedObject();
    return ptr.ToInt32();
}

throws an Object contains non-primitive or non-blittable data. exception.

However that is not the only problem. It is my understanding that the GC can reallocate entire portions of the memory and fix all the addresses used when it does so. This would make the address change, and break the hashcode.

Possible solutions:

Get rid of the hashmap (dictionary): When searching for existing ID of an object, manually reference compare it to all the existing objects. This would fix the problem, but would be rather slow.
Manually add the ID directly to the object on creation: This seems like making most sense, but I am making the debugger as a mod, so this would require too much change of the code for a mod IMO.
Fix the GetAddress method: Even with the address change, this is a valid solution, because the object can be in the map twice and the debugger will still work properly.
Allocate new ID for each seeing of the object: This would mean only one-directional map from ID to object, and each object would be multiple times in it with different IDs. It would work, but very heavy on the memory.
Get rid of the map altogether: This would require not only to fix the GetAddress method, but use the address as the ID directly, and then cast from ID (address) to object directly. This can be dangerous, since the address can contain arbitrary binary data by the time the ID request comes, plus will not work with long addresses, since the ID is only an integer.

Or any other possible solutions? How do debuggers usually solve this problem?

I would go for Dictionarie mapping `WeakReference<>` to IDs. Other options can be found here: https://stackoverflow.com/questions/750947/net-unique-object-identifier — Klaus Gütter, Oct 06 '19 at 05:33
Possible duplicate of [.NET unique object identifier](https://stackoverflow.com/questions/750947/net-unique-object-identifier) (test edit 1, found a possible bug, will post on meta soon) — kajacx, Oct 06 '19 at 10:51
Thanks, `WeakReference` certainly solves the unused objects problem (that I was looking to solve later), but I still need an effective hash from object to ID that will not change when the object content chage or when it is reallocated in memory. (test edit 2, found a possible bug, will post on meta soon) — kajacx, Oct 06 '19 at 10:53
Appologies, my username was changes and I was super confused. Anyway, @KlausGütter as I was saying, I still need a reliable and effective way to map from object to (integer) ID. The `ConditionalWeakTable` looks promissing, but it apparently uses `RuntimeHelpers.GetHashCode` to compute the hash, which in turn uses the default object implementation, which derives the hash from the object address, if I understand correctly. So we are back to the problem of GC reallocation. Unless the `ConditionalWeakTable` can somehow listen to the event of GC reallocation and re-index the moved objects? — kajacx, Oct 06 '19 at 11:45
What makes you believe that `RuntimeHelpers.GetHashCode` uses the object's address? I strongly doubt this. — Klaus Gütter, Oct 06 '19 at 12:02
On the answers said it used the default object hashCode method, even when the object being hashed has overriden it. I tried to look up how the default method works, but could not find anything I can understand or would explain to me how the method actually works. I always thought it uses the object address, because what else could it use that is different between different objects (instances of the `object` class) but same for same objects? — kajacx, Oct 06 '19 at 17:21
Probably it uses some internal identification that may also be used by the GC. It cannot be the address because - as you noted - this may change because of memory compaction. And the Hash is not allowed to change. — Klaus Gütter, Oct 06 '19 at 17:49
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/200478/discussion-between-kajacx-and-klaus-gutter). — kajacx, Oct 06 '19 at 18:09
These articles shed some light on how `RuntimeHelpers.GetHashCode` is implemented: https://weekly-geekly.github.io/articles/149584/index.html and https://web.archive.org/web/20150515023057/https://msdn.microsoft.com/en-us/magazine/cc163791.aspx. Basically it is calculated on demand and stored in the SyncBlock (which is also allocated on demand). — Klaus Gütter, Oct 07 '19 at 05:20
Thanks, I finally found an article here: https://codingsight.com/the-origin-of-gethashcode-in-net/ and yea, the hash is stored in the object's `SyncBlock`. An interesting solution. — kajacx, Oct 07 '19 at 09:53

Best way to keep track of objects when writing a debugger

0 Answers0