5

I've got some really large memory dumps of a managed process that I'm trying to get a lot of statistics from--as well as be able to present an interactive view of--fairly deep object graphs on the heap. Think something comparable to !do <address> with prefer_dml 1 set in WinDbg with SOS, where you can continually click on the properties and see their values, only in a much friendlier UI for comparing many objects.

I've found Microsoft.Diagnostics.Runtime (ClrMD) to be particularly well suited for this task, but I'm having a hard time working with array fields and I'm a little confused about object fields, which I have working a little better.


Array: If I target an array with an address directly off the heap and use ClrType.GetArrayLength and ClrType.GetArrayElementValue things work fine, but once I'm digging through the fields on another object, I'm not sure what value I'm getting from ClrInstanceField.GetValue when the ClrInstanceField.ElementType is ClrElementType.SZArray (I haven't encountered Array digging around in my object graph yet, but I should like to handle it as well).

Edit: I just decided to use the ClrType for System.UInt64 to dereference the array field (using parent address + offset of the array field to calculate the address where the array pointer is stored), then I can work with it the same as if I got it from EnumerateObjects. I am now having some difficulty with some arrays not supporting the ArrayComponentType property. I have yet to test with arrays of Structs so I am also wondering if that will be a C-style allocation of inline structs, as it is with int[] or if it will be an array of pointers to structs on the heap. Guid[] is one of the types I'm having an issue getting the ArrayComponentType from.

Object: Fixed (logic error) With a ClrInstanceField that has a Type of ClrElementType.Object I get much better results, but still need a little more. Firstly, after calling GetFieldValue I get back a ulong address(?) which I can use ClrInstanceField.Type.Fields against just fine, so I can see the field names and values of the nested object. That said, I have to account for polymorphism, so I tried using ClrHeap.GetObjectType on the same address and it either returns NULL or something completely incorrect. It seems odd that the address would work in my first use case, but not the second.

String: Fixed (found workaround) Because my real project already uses DbgEng w/ SOS, I have a different way to easily get the value of strings by address, but it seemed very odd that trying to use ClrInstanceField.GetFieldValue succeeded in returning a string, but with completely inaccurate results (a bunch of strange characters). Maybe I'm doing this wrong?


Edit: I have extracted an abstraction that now runs in LINQPad from my original code. It's a bit long to post here, but it's all here in a gist. It's still a little messy from all the copy/paste/refactor and I'll be cleaning it up further an likely posting the final source on either CodePlex or GitHub after I've got these issues fixed.

The code base is fairly large and specific to a project, but if it's absolutely necessary I may be able to extract out a sample set. That said, all access to the ClrMD objects is fairly simple. I get the initial addresses from SOS commands like !dumpheap -stat (which works fine for the root objects) and then I use ClrHeap.GetTypeByName or ClrHeap.GetObjectType. After that it relies exclusively on ClrType.Fields and ClrInstanceField members Type, ElementType, and GetFieldValue

As an added bonus, I did find a browser friendly version of the XML Docs provided with the NuGet package, though it's the same documentation IntelliSense provides.

TheXenocide
  • 1,060
  • 8
  • 22
  • 1
    I've created an extension library on top of ClrMD that nicely integrate with LINQPad, check this out if you are interested: https://github.com/JeffCyr/ClrMD.Extensions – Jeff Cyr Dec 20 '14 at 04:37

1 Answers1

4

It's going to be hard to answer very precisely without seeing what your code looks like, but basically, it goes like this:

The first thing you need to know in order to be able to call GetFieldAddress/GetFieldValue is if the object address you have is a regular pointer or an interior pointer. That is, if it directly points to an object on the heap, or to an interior structure within an actual object (think String vs. Struct field within an actual object).

If you're getting the wrong values out of GetFieldAddress/GetFieldValue, it usually means you're not specifying that you have an interior pointer (or you thought you had one when you didn't).

The second part is understanding what the values mean.

If field.IsPrimitive() is true: GetFieldValue() will get you the actual primitive value (i.e. an Int32, Byte, or whatever)

If field.IsValueClass() is true, then GetFieldAddress() will get you an interior pointer to the structure. Thus, any calls on GetFieldAddress/Value() that you use on that address you need to tell it that it is an interior pointer!

If field.ElementType is a ClrElementType.String, then I seem to remember you need to call GetFieldValue will get you the actual string contents (need to check, but this should be it).

Otherwise, you have an object reference, in which case GetFieldValue() will get you a regular pointer to the new reference object.

Does this make sense?

tomasr
  • 13,683
  • 3
  • 38
  • 30
  • It makes perfect sense, though I think some of the difficulty I was having was based on an abstraction in my code that didn't know whether it was interior or not; it turned out easier for me to think about by just using Address and Offset for some scenarios, though now that I'm cleaning up my code I might be able to go back to using GetFieldValue. I've actually got much of my code extracted now, almost up and running standalone in LINQPad (struggling with SZArrays that don't report their ArrayComponentType for some reason). – TheXenocide Mar 06 '14 at 20:06
  • As for string, GetFieldValue was definitely returning weird values, but I commented on a workaround I found that gets the string length and then just uses ReadMemory to populate a byte[] of Unicode data, which is easy enough to work with. Once my standalone runs correctly I'll update the post and maybe ask for some more feedback? Thanks kindly. – TheXenocide Mar 06 '14 at 20:10
  • I've updated the question, which now also includes a link to the code. – TheXenocide Mar 07 '14 at 00:05
  • I took a quick look at the code, and indeed seems like you need to keep better track of interior pointers as you navigate an object structure. It's really not hard, but it does require thinking clearly about what the address is pointing to. – tomasr Mar 07 '14 at 00:37
  • Also, about Arrays, something you need to keep in mind is that there will be instances where ArrayComponentType is missing because there are issues in the CLR debugging interfaces where that information is lost. That's really only significant if you have an array of primitive/structs, otherwise, you should always get the type from the heap using the item address. – tomasr Mar 07 '14 at 00:39
  • Finally, regarding strings: You don't need to mess to get the actual value. If you have an address that points to a System.String object, just do something like: String val = heap.GetObjectType(address).GetValue(address) as String; – tomasr Mar 07 '14 at 00:42
  • Yeah, I thought I would need to add a parameter/etc. to the struct abstraction so it knew whether it was directly off the heap or not. I'm trying the string out now since it's low hanging fruit. Alas, I wish I could get at some of the Guid arrays in this particular memory dump, but as long as it's a core issue, not something I'm doing wrong, then I'll just account for it in the array classes. Thanks a ton for the input! – TheXenocide Mar 07 '14 at 14:28
  • Actually; regarding ValueType arrays; is it feasible to use raw memory or direct access of some sort to inspect the array? I get that the ClrMD stuff might not have the ability, but the dump should still contain the data as long as it's on the heap, right? Maybe I can use some combination of Addresses/Offsets? Just a thought; as long as I have reference arrays I can do most of what I need for now. – TheXenocide Mar 07 '14 at 14:33
  • I knew I had seen some stuff about this regarding SOS/WinDbg. I think I can use the information here: http://blogs.msdn.com/b/shawnfa/archive/2004/04/30/124218.aspx but it brings back up a question I've been wondering: Is there a way to get a ClrType from a MethodTable address? My real project actually uses both ClrMD *and* DbgEng + SOS. – TheXenocide Mar 07 '14 at 14:39
  • Dealing with ValueType arrays is perfectly possible with ClrMD right now. Basically, for each entry call arrayType.GetArrayElementAddress() and treat the result as an interior pointer when calling GetFieldAddress/Value later on. – tomasr Mar 07 '14 at 23:51
  • Also, MethodTable addresses are not exposed in ClrMD directly, as they have some downsides (but sure are convenient). What kind of matching do you need to do between the two sides? Most (if not all) of the stuff SOS does can be done with ClrMD... – tomasr Mar 07 '14 at 23:52
  • I have arrays that don't expose their component type properly and have some errors when using standard array methods (as you mentioned above, it seems there are some limitations to the interface). I have some Guid[] and dictionary buckets that should be available in the memory dump, but for which I can't seem to get the ClrType very cleanly. I was thinking the article I linked (and some others) point out the memory layout, which includes MT info. I was just hoping for a reliable way to get the ClrType. I tried removing [] off the end of the Array type name, but generics got the best of me. – TheXenocide Mar 10 '14 at 13:45
  • If you *know* what the actual object type is, then you just need to fetch the ClrType from the ClrHeap object and it would work. if you're writing just generic code, then stepping around the CLR limitation is hard because you really don't know what you're dealing with. – tomasr Mar 10 '14 at 18:17
  • So the type for the array itself seems relatively informative, but not quite good enough which is why I was wondering about the MT. The array reports its name correctly (e.g. `System.Guid[]` or `System.Collections.Generic.Dictionary+Entry[]`), but the ArrayComponentType returns null for these, and the other array methods don't work. The first thing I tried was to remove the trailing [] from the name and try to get the type from the heap, but I don't think the generics play nice with it. – TheXenocide Mar 10 '14 at 23:03
  • I may also be able to extract some information from the parent object, but I haven't had a chance to experiment a ton. The Dictionary those entries above come from lists its name as: From a parent object I do have `System.Collections.Generic.Dictionary` – TheXenocide Mar 10 '14 at 23:04
  • Can you share a sample dump and piece of code that shows the issue? I'd be happy to take a look... – tomasr Mar 10 '14 at 23:16
  • The one I'm using is proprietary so I'll have to set aside some time to try to make some repros, but I'd be glad to give it a whirl. Work priorities have been readjusted a bit, but once this is complete I'm hoping to make an open source project out of it, so some more personal time is due (just a little harder to come by). I'm thinking of making it into something with some common troubleshooting views and a mechanism for querying and digging through. You'll definitely be credited for all the help. Thanks so much! – TheXenocide Mar 12 '14 at 16:10
  • Apologies for the delays, I've had a very hectic week. I'll try to carve out some spare time this weekend (assuming the weekend is any better than the last couple weeks have been lol). – TheXenocide Mar 28 '14 at 13:40
  • Can someone point me to information on how and when to use interior=true? So far, I see no logic in this boolean and can't find concise information on when it should be true. Hope someone can help ... – Diana Mar 07 '16 at 12:03