10

I have the following class:

[StructLayout(LayoutKind.Sequential)]
class Class
{
    public int Field1;
    public byte Field2;
    public short? Field3;
    public bool Field4;
}

How can I get the byte offset of Field4 starting from the start of the class data (or object header)?
To illustrate:

Class cls = new Class();
fixed(int* ptr1 = &cls.Field1) //first field
fixed(bool* ptr2 = &cls.Field4) //requested field
{
    Console.WriteLine((byte*)ptr2-(byte*)ptr1);
}

The resulting offset is, in this case, 5, because the runtime actually moves Field3 to the end of the type (and pads it), probably because it its type is generic. I know there is Marshal.OffsetOf, but it returns unmanaged offset, not managed.

How can I retrieve this offset from a FieldInfo instance? Is there any .NET method used for that, or do I have to write my own, taking all the exceptions into account (type size, padding, explicit offsets, etc.)?

IS4
  • 11,945
  • 2
  • 47
  • 86
  • @usr Meant `Field3`. Actually, it is reordered, to my surprise. It moved that field to the end of the class, padding it (debug and release, 32-bit). It may have something to do with inability to obtain pointers of generic types. – IS4 Jun 13 '15 at 11:14
  • 1
    You cannot find out, managed object layout is an implementation detail. Other than through the backdoor you already discovered. The CLR uses this to optimize the layout, making the object as small as possible while still providing alignment guarantees. [StructLayout] is only honored on marshaled structures. In which case Marshal.SizeOf() gives you the offset. – Hans Passant Jun 13 '15 at 15:24
  • 1
    @Hans But Marshal.SizeOf returns the unmanaged type size, not the managed one. I thought LayoutKind.Explicit is honored on structs, overlapping fields being the proof of it, without marshalling. – IS4 Jun 13 '15 at 16:26
  • Oops, Marshal.OffsetOf(). – Hans Passant Jun 13 '15 at 16:27
  • Element of least surprise violated. Not common with the CLR. Thanks for the question. – leppie Jun 19 '15 at 19:02

2 Answers2

10

Offset of a field within a class or struct in .NET 4.7.2:

public static int GetFieldOffset(this FieldInfo fi) =>
                    GetFieldOffset(fi.FieldHandle);

public static int GetFieldOffset(RuntimeFieldHandle h) => 
                    Marshal.ReadInt32(h.Value + (4 + IntPtr.Size)) & 0xFFFFFF;

These return the byte offset of a field within a class or struct, relative to the layout of some respective managed instance at runtime. This works for all StructLayout modes, and for both value- and reference-types (including generics, reference-containing, or otherwise non-blittable). The offset value is zero-based relative to the beginning of the user-defined content or 'data body' of the struct or class only, and doesn't include any header, prefix, or other pad bytes.

Discussion

Since struct types have no header, the returned integer offset value can used directly via pointer arithmetic, and System.Runtime.CompilerServices.Unsafe if necessary (not shown here). Reference-type objects, on the other hand, have a header which has to be skipped-over in order to reference the desired field. This object header is usually a single IntPtr, which means IntPtr.Size needs to be added to the the offset value. It is also necessary to dereference the GC ("garbage collection") handle to obtain the object's address in the first place.

With these considerations, we can synthesize a tracking reference to the interior of a GC object at runtime by combining the field offset (obtained via the method shown above) with an instance of the class (e.g. an Object handle).

The following method, which is only meaningful for class (and not struct) types, demonstrates the technique. For simplicity, it uses ref-return and the System.Runtime.CompilerServices.Unsafe libary. Error checking, such as asserting fi.DeclaringType.IsSubclassOf(obj.GetType()) for example, is also elided for simplicity.

/// <summary>
/// Returns a managed reference ("interior pointer") to the value or instance of type 'U'
/// stored in the field indicated by 'fi' within managed object instance 'obj'
/// </summary>
public static unsafe ref U RefFieldValue<U>(Object obj, FieldInfo fi)
{
    var pobj = Unsafe.As<Object, IntPtr>(ref obj);
    pobj += IntPtr.Size + GetFieldOffset(fi.FieldHandle);
    return ref Unsafe.AsRef<U>(pobj.ToPointer());
}

This method returns a managed "tracking" pointer into the interior of the garbage-collected object instance obj.[see comment] It can be used to arbitrarily read or write the field, so this one function replaces the traditional pair of separate getter/setter functions. Although the returned pointer cannot be stored in the GC heap and thus has a lifetime limited to the scope of the current stack frame (i.e., and below), it is very cheap to obtain at any time by simply calling the function again.

Note that this generic method is only parameterized with <U>, the type of the fetched pointed-at value, and not for the type ("<T>", perhaps) of the containing class (the same applies for the IL version below). It's because the bare-bones simplicity of this technique doesn't require it. We already know that the containing instance has to be a reference (class) type, so at runtime it will present via a reference handle to a GC object with object header, and those facts alone are sufficient here; nothing further needs to be known about putative type "T".

It's a matter of opinion whether adding vacuous <T, … >, which would allow us to indicate the where T: class constraint, would improve the look or feel of the example above. It certainly wouldn't hurt anything; I believe the JIT is smart enough to not generate additional generic method instantiations for generic arguments that have no effect. But since doing so seems chatty (other than for stating the constraint), I opted for the minimalism of strict necessity here.

In my own use, rather than passing a FieldInfo or its respective FieldHandle every time, what I actually retain are the various integer offset values for the fields of interest as returned from GetFieldOffset, since these are also invariant at runtime, once obtained. This eliminates the extra step (of calling GetFieldOffset) each time the pointer is fetched. In fact, since I am able to include IL code in my projects, here is the exact code that I use for the function above. As with the C# just shown, it trivially synthesizes a managed pointer from a containing GC-object obj, plus a (retained) integer offset offs within it.

// Returns a managed 'ByRef' pointer to the (struct or reference-type) instance of type U 
// stored in the field at byte offset 'offs' within reference type instance 'obj'

.method public static !!U& RefFieldValue<U>(object obj, int32 offs) aggressiveinlining
{
    ldarg obj
    ldarg offs
    sizeof object
    add
    add
    ret
}

So even if you are not able to directly incorporate this IL, showing it here, I think, nicely illustrates the extremely low runtime overhead and alluring simplicity, in general, of this technique.

Example usage

class MyClass { public byte b_bar; public String s0, s1; public int iFoo; }

The first demonstration gets the integer offset of reference-typed field s1 within an instance of MyClass, and then uses it to get and set the field value.

var fi = typeof(MyClass).GetField("s1");

// note that we can get a field offset without actually
// having any instance of 'MyClass'
var offs = GetFieldOffset(fi);

// i.e., later... 

var mc = new MyClass();

RefFieldValue<String>(mc, offs) = "moo-maa";      // field "setter"

// note: method call used as l-value, on the left-hand side of '=' assignment!

RefFieldValue<String>(mc, offs) += "!!";          // in-situ access

Console.WriteLine(mc.s1);                         // --> moo-maa!! (in the original)

// can be used as a non-ref "getter" for by-value access
var _ = RefFieldValue<String>(mc, offs) + "%%";   // 'mc.s1' not affected

If this seems a bit cluttered, you can dramatically clean it up by retaining the managed pointer as ref local variable. As you know, this type of pointer is automatically adjusted--with interior offset preserved--whenever the GC moves the containing object. This means that it will remain valid even as you continue accessing the field unawares. In exchange for allowing this capability, the CLR requires that the ref local variable itself not be allowed to escape its stack frame, which in this case is enforced by the C# compiler.

// demonstrate using 'RuntimeFieldHandle', and accessing a value-type
// field (int) this time
var h = typeof(MyClass).GetField(nameof(mc.iFoo)).FieldHandle; 

// later... (still using 'mc' instance created above)

// acquire managed pointer to 'mc.iFoo'
ref int i = ref RefFieldValue<int>(mc, h);      

i = 21;                                        // directly affects 'mc.iFoo'
Console.WriteLine(mc.iFoo == 21);              // --> true

i <<= 1;                                       // operates directly on 'mc.iFoo'
Console.WriteLine(mc.iFoo == 42);              // --> true

// any/all 'ref' uses of 'i' just affect 'mc.iFoo' directly:
Interlocked.CompareExchange(ref i, 34, 42);    // 'mc.iFoo' (and 'i' also): 42 -> 34

Summary

The usage examples focused on using the technique with a class object, but as noted, the GetFieldOffset method shown here works perfectly fine with struct as well. Just be sure not to use the RefFieldValue method with value-types, since that code includes adjusting for an expected object header. For that simpler case, just use System.Runtime.CompilerServicesUnsafe.AddByteOffset for your address arithmetic instead.

Needless to say, this technique might seem a bit radical to some. I'll just note that it has worked flawlessly for me for many years, specifically on .NET Framework 4.7.2, and including 32- and 64-bit mode, debug vs. release, plus whichever various JIT optimization settings I've tried.

Glenn Slayden
  • 17,543
  • 3
  • 114
  • 108
  • Impressive, though quite depedant on implementation details. Something like this should be added to .NET reflection now that ref returns are common. I wonder about `RefFieldValue` though; doesn't `obj` need pinning before you can safely do pointer arithmetic on the interior? Otherwise GC might move the object around before you obtain the reference. – IS4 Jun 10 '19 at 09:28
  • @IllidanS4 I continue to wonder about that myself; I've been trying to find out if the GC only intercedes at "non-empty" states of the (IL) execution stack. I realize that the IL is gone by then, but sequence points could have been noted and mapped to native such that GC only pauses each managed thread when there can't be any refs picked up into registers...? But even if there's no such guarantee, the fix won't need full pinning anyway; explicitly publishing as a (non-pinned) local should be sufficient advertisement, but would have to come from the caller, maybe implicating `ref struct`. – Glenn Slayden Jun 10 '19 at 10:14
  • @IllidanS4 In other words, if GC can only happen at the implicit sequence points according to the original IL code, then `obj` is protected by being the (execution-stacked) 'ref return' of `RefFieldValue`. – Glenn Slayden Jun 10 '19 at 20:48
  • 1
    I am not sure that's the case - the GC is free to move all references at any time, and I don't think that being an argument to the method protects it. Otherwise there would be no need for a `pinned`/`fixed` reference. – IS4 Jun 11 '19 at 12:43
  • @IllidanS4 Well even with pinning, there must be *some* provision that prevents the GC from altering an address value between, say, an `lea esi,[ebx+...]` instruction and an immediately following `mov eax,[esi]`. I wouldn't think the GC would want to start trying to figure out how to fix-up values that have already been picked-up in CPU registers... – Glenn Slayden Jun 12 '19 at 04:39
  • 1
    You're safe while an object ref is in an object type variable, but I don't think you're safe without pinning if the object ref is in an IntPtr, so I'm a bit worried about the object getting moved while the field offset is being added to the IntPtr before being turned into a proper interior reference. – Mike Marynowski Sep 02 '19 at 16:22
  • @MikeMarynowski Agreed. I must confess that what you mention would seem to be a difference between the C# version (which transits through `IntPtr`) and IL version (which doesn't) shown above, and it's only the IL version that I actually use extensively without incident. – Glenn Slayden Sep 03 '19 at 19:19
  • The IL version is interesting. I'm not 100% sure if it is safe or not. When the GC scans stack for refs, it needs some kind of way to determine which memory locations / registers might contain refs...I don't know how that algorithm will see the memory addresses there. Is it based off the method call param types or the ldarg object type or both? I don't know. I remember reading that it tries to be conservative though so this could be safe. – Mike Marynowski Sep 04 '19 at 21:20
  • Relevant read: https://stackoverflow.com/questions/17130382/understanding-garbage-collection-in-net/17131389#17131389 It would seem to me that this is safe - obj ref is first loaded with ldarg and that stays on the stack until the last add, at which point the GC should know that that value is the return type which is an internal ref. I'm going to ping Hans on that thread to see what he thinks. – Mike Marynowski Sep 04 '19 at 21:34
  • @MikeMarynowski I'm assuming that the `ldarg obj` is sufficient to protect the generation of the managed ref itself. And then, that the nascent managed ref is protected by the (lack of) sequence point in my IL, i.e., being execution-stacked all the way until it becomes the caller's responsibility. Please do correct me if I'm wrong. – Glenn Slayden Sep 04 '19 at 23:25
  • 2
    That C# version of `RefFieldValue` is definitely not safe, but it can be made safe by keeping the interior ref managed throughout. You can reinterptret to a type of known layout to get a starting managed ref and then add your offset to that. Something like: `class AnyClass { public byte FirstField; }` and then: `static ref U RefFieldValue(object obj, FieldInfo fi) => ref Unsafe.As(ref Unsafe.AddByteOffset(ref Unsafe.As(obj).FirstField, GetFieldOffset(fi)));` – saucecontrol Dec 28 '20 at 20:53
  • hi, stupid question, how would I get the instance of the class given a property of it? – Alex Sep 02 '22 at 19:37
  • @Alex I assume you mean, get the class instance (GC handle) given a reference to one of its fields? According to [this](https://stackoverflow.com/questions/52824792/recover-containing-gc-object-from-managed-ref-interior-pointer), you can't. – Glenn Slayden Sep 02 '22 at 19:48
  • I thought I was being very clever by trying to implement linux's "container_of" from list.h https://github.com/torvalds/linux/blob/master/include/linux/list.h to cast crawled head back to parent object. that's a bummer, thanks for the quick reply. – Alex Sep 02 '22 at 19:56
  • @GlennSlayden Can you help me understand what the `+ 4` is in `GetFieldOffset`? It feels like the other two `IntPtr.Size` increments to the object pointer are for the object header and method table, but the addition of 4 bytes worth of addressing leaves me thinking I'm missing something here? – TheXenocide Nov 05 '22 at 03:16
  • @TheXenocide We're accessing a row in one of the IL runtime metadata tables here, not anything to do with a GC handle or a GC object instance itself. So the address calculation has to do with the position of the IL metadata token value for a field in the [mtdFieldDef](https://learn.microsoft.com/en-us/dotnet/framework/unmanaged-api/metadata/cortokentype-enumeration) table. – Glenn Slayden Nov 14 '22 at 20:37
  • @GlennSlayden ahh, okay, I haven't looked into the inner workings of that much; I'm more used to digging through memory dumps than using IL in these ways. I will note, though, that I did some performance testing and, at least for my very limited use case, FieldInfo.SetValue had comparable performance (sometimes faster, sometimes slower, always minimally different, across many tests) which I only note in case someone else tries this to avoid reflection to set private field values. I cached delegates during initialization to avoid repeated lookup costs. My case also had no boxing. – TheXenocide Dec 11 '22 at 22:43
6

With some tricks around TypedReference.MakeTypedReference, it is possible to obtain the reference to the field, and to the start of the object's data, then just subtract. The method can be found in SharpUtils.

IS4
  • 11,945
  • 2
  • 47
  • 86
  • Is there perhaps some way to do this without using any kind of code generation in `Pin` and `MakeTypedReference` (so no DynamicMethod or Expressions)? – Riki May 25 '19 at 09:51
  • @riki Perhaps using the new `Memory` and `Span` types and similar unsafe API, but I haven't had experience with those. – IS4 Jun 04 '19 at 09:49