28

How do I marshal this C++ type?

The ABS_DATA structure is used to associate an arbitrarily long data block with the length information. The declared length of the Data array is 1, but the actual length is given by the Length member.

typedef struct abs_data {
  ABS_DWORD Length;
  ABS_BYTE Data[ABS_VARLEN];
} ABS_DATA;

I tried the following code, but it's not working. The data variable is always empty and I'm sure it has data in there.

[System.Runtime.InteropServices.StructLayoutAttribute(System.Runtime.InteropServices.LayoutKind.Sequential, CharSet = System.Runtime.InteropServices.CharSet.Ansi)]
    public struct abs_data
    {
        /// ABS_DWORD->unsigned int
        public uint Length;

        /// ABS_BYTE[1]
       [System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.ByValTStr, SizeConst = 1)]
        public string Data;
    }
Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
Ezi
  • 2,212
  • 8
  • 33
  • 60

5 Answers5

47

Old question, but I recently had to do this myself and all the existing answers are poor, so...

The best solution for marshaling a variable-length array in a struct is to use a custom marshaler. This lets you control the code that the runtime uses to convert between managed and unmanaged data. Unfortunately, custom marshaling is poorly-documented and has a few bizarre limitations. I'll cover those quickly, then go over the solution.

Annoyingly, you can't use custom marshaling on an array element of a struct or class. There's no documented or logical reason for this limitation, and the compiler won't complain, but you'll get an exception at runtime. Also, there's a function that custom marshalers must implement, int GetNativeDataSize(), which is obviously impossible to implement accurately (it doesn't pass you an instance of the object to ask its size, so you can only go off the type, which is of course variable size!) Fortunately, this function doesn't matter. I've never seen it get called, and it the custom marshaler works fine even if it returns a bogus value (one MSDN example has it return -1).

First of all, here's what I think your native prototype might look like (I'm using P/Invoke here, but it works for COM too):

// Unmanaged C/C++ code prototype (guess)
//void DoThing (ABS_DATA *pData);

// Guess at your managed call with the "marshal one-byte ByValArray" version
//[DllImport("libname.dll")] public extern void DoThing (ref abs_data pData);

Here's the naïve version of how you might have used a custom marshaler (which really ought to have worked). I'll get to the marshaler itself in a bit...

[StructLayout(LayoutKind.Sequential)]
public struct abs_data
{
    // Don't need the length as a separate filed; managed arrays know it.
    [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef=typeof(ArrayMarshaler<byte>))]
    public byte[] Data;
}

// Now you can just pass the struct but it takes arbitrary sizes!
[DllImport("libname.dll")] public extern void DoThing (ref abs_data pData);

Unfortunately, at runtime, you apparently can't marshal arrays inside data structures as anything except SafeArray or ByValArray. SafeArrays are counted, but they look nothing like the (extremely common) format that you're looking for here. So that won't work. ByValArray, of course, requires that the length be known at compile time, so that doesn't work either (as you ran into). Bizarrely, though, you can use custom marshaling on array parameters, This is annoying because you have to put the MarshalAsAttribute on every parameter that uses this type, instead of just putting it on one field and having that apply everywhere you use the type containing that field, but c'est la vie. It looks like this:

[StructLayout(LayoutKind.Sequential)]
public struct abs_data
{
    // Don't need the length as a separate filed; managed arrays know it.
    // This isn't an array anymore; we pass an array of this instead.
    public byte Data;
}

// Now you pass an arbitrary-sized array of the struct
[DllImport("libname.dll")] public extern void DoThing (
    // Have to put this huge stupid attribute on every parameter of this type
    [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef=typeof(ArrayMarshaler<abs_data>))]
    // Don't need to use "ref" anymore; arrays are ref types and pass as pointer-to
    abs_data[] pData);

In that example, I preserved the abs_data type, in case you want to do something special with it (constructors, static functions, properties, inheritance, whatever). If your array elements consisted of a complex type, you would modify the struct to represent that complex type. However, in this case, abs_data is basically just a renamed byte - it's not even "wrapping" the byte; as far as the native code is concerned it's more like a typedef - so you can just pass an array of bytes and skip the struct entirely:

// Actually, you can just pass an arbitrary-length byte array!
[DllImport("libname.dll")] public extern void DoThing (
    // Have to put this huge stupid attribute on every parameter of this type
    [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef=typeof(ArrayMarshaler<byte>))]
    byte[] pData);

OK, so now you can see how to declare the array element type (if needed), and how to pass the array to an unmanaged function. However, we still need that custom marshaler. You should read "Implementing the ICustomMarshaler Interface" but I'll cover this here, with inline comments. Note that I use some shorthand conventions (like Marshal.SizeOf<T>()) that require .NET 4.5.1 or higher.

// The class that does the marshaling. Making it generic is not required, but
// will make it easier to use the same custom marshaler for multiple array types.
public class ArrayMarshaler<T> : ICustomMarshaler
{
    // All custom marshalers require a static factory method with this signature.
    public static ICustomMarshaler GetInstance (String cookie)
    {
        return new ArrayMarshaler<T>();
    }

    // This is the function that builds the managed type - in this case, the managed
    // array - from a pointer. You can just return null here if only sending the 
    // array as an in-parameter.
    public Object MarshalNativeToManaged (IntPtr pNativeData)
    {
        // First, sanity check...
        if (IntPtr.Zero == pNativeData) return null;
        // Start by reading the size of the array ("Length" from your ABS_DATA struct)
        int length = Marshal.ReadInt32(pNativeData);
        // Create the managed array that will be returned
        T[] array = new T[length];
        // For efficiency, only compute the element size once
        int elSiz = Marshal.SizeOf<T>();
        // Populate the array
        for (int i = 0; i < length; i++)
        {
            array[i] = Marshal.PtrToStructure<T>(pNativeData + sizeof(int) + (elSiz * i));
        }
        // Alternate method, for arrays of primitive types only:
        // Marshal.Copy(pNativeData + sizeof(int), array, 0, length);
        return array;
    }

    // This is the function that marshals your managed array to unmanaged memory.
    // If you only ever marshal the array out, not in, you can return IntPtr.Zero
    public IntPtr MarshalManagedToNative (Object ManagedObject)
    {
        if (null == ManagedObject) return IntPtr.Zero;
        T[] array = (T[])ManagedObj;
        int elSiz = Marshal.SizeOf<T>();
        // Get the total size of unmanaged memory that is needed (length + elements)
        int size = sizeof(int) + (elSiz * array.Length);
        // Allocate unmanaged space. For COM, use Marshal.AllocCoTaskMem instead.
        IntPtr ptr = Marshal.AllocHGlobal(size);
        // Write the "Length" field first
        Marshal.WriteInt32(ptr, array.Length);
        // Write the array data
        for (int i = 0; i < array.Length; i++)
        {   // Newly-allocated space has no existing object, so the last param is false
            Marshal.StructureToPtr<T>(array[i], ptr + sizeof(int) + (elSiz * i), false);
        }
        // If you're only using arrays of primitive types, you could use this instead:
        //Marshal.Copy(array, 0, ptr + sizeof(int), array.Length);
        return ptr;
    }

    // This function is called after completing the call that required marshaling to
    // unmanaged memory. You should use it to free any unmanaged memory you allocated.
    // If you never consume unmanaged memory or other resources, do nothing here.
    public void CleanUpNativeData (IntPtr pNativeData)
    {
        // Free the unmanaged memory. Use Marshal.FreeCoTaskMem if using COM.
        Marshal.FreeHGlobal(pNativeData);
    }

    // If, after marshaling from unmanaged to managed, you have anything that needs
    // to be taken care of when you're done with the object, put it here. Garbage 
    // collection will free the managed object, so I've left this function empty.
    public void CleanUpManagedData (Object ManagedObj)
    { }

    // This function is a lie. It looks like it should be impossible to get the right 
    // value - the whole problem is that the size of each array is variable! 
    // - but in practice the runtime doesn't rely on this and may not even call it.
    // The MSDN example returns -1; I'll try to be a little more realistic.
    public int GetNativeDataSize ()
    {
        return sizeof(int) + Marshal.SizeOf<T>();
    }
}

Whew, that was long! Well, there you have it. I hope people see this, because there's a lot of bad answers and misunderstanding out there...

CBHacking
  • 1,984
  • 16
  • 20
  • 1
    Thanks for the awesome summarization! This helps me a lot on understanding the marshal internals..... While I' having a slightly worse situation: "struct Foo {int a; int b; /* unrelated fields */, int size, int arr[0]; }". Due to the additional fields a and b, I cannot pass it as argument as what you did. Any suggestions? – wsxiaoys Apr 04 '17 at 03:27
  • @wsxiaoys: I don't see what the problem is... You write a custom marshaler for `Foo` that allocates the additional bytes for `a`, `b`, and your "unrelated fields" and populates them, then writes the array size and values at the appropriate offset. Unmarshaling is the same; read `a` and `b` out of the native pointer the same way we read the size above, but at the appropriate offsets, then get the size and allocate the managed array, then populate it. The managed representation of `Foo` probably doesn't need a `size` field, though you can include it if you want. – CBHacking Apr 05 '17 at 08:15
  • thanks, very informative and useful, unfortunately it looks not working in the "core" branch 5.0 included - see https://github.com/dotnet/runtime/issues/8271 . +1 anyway for the great answer! – Mosè Bottacini Jan 17 '21 at 13:30
  • @MosèBottacini You can't use a custom marshaler on struct fields - as I said, the compiler doesn't complain but it produces a runtime error - and that's what the issue you linked is attempting. You can still marshal the function parameters, as I show, or at least you should be able to and if you can't, please open a new bug on GitHub (the one you linked is different). – CBHacking Jan 29 '21 at 11:31
  • @CBHacking maybe I didn't express myself well but I thanked you for the great answer, and yes, the link I posted is * exactly * my case; Anyway I've reengineered the code so to be more ".Net 5.0" oriented which is bad and good at the same way, but it has it's "pros" like much more control at cost of more code to write: now I've Serialize and Deserialize methods that fills and extract to/from a Span in the intended way. – Mosè Bottacini Jan 30 '21 at 15:23
7

It is not possible to marshal structs containing variable-length arrays (but it is possible to marshal variable-length arrays as function parameters). You will have to read your data manually:

IntPtr nativeData = ... ;
var length = Marshal.ReadUInt32 (nativeData) ;
var bytes  = new byte[length] ;

Marshal.Copy (new IntPtr ((long)nativeData + 4), bytes, 0, length) ;
Anton Tykhyy
  • 19,370
  • 5
  • 54
  • 56
  • 2
    Marshal.ReadUInt32() does not exist on the .NET 4.0 framework, sadly. Can anyone explain why? – Cameron Sep 20 '12 at 05:28
  • 1
    Probably because unsigned types are not CLS compliant. Use the `Marshal.ReadIntNn` functions and cast to unsigned manually. – Anton Tykhyy Sep 20 '12 at 06:21
6

If the data being saved isn't a string, you don't have to store it in a string. I usually do not marshal to a string unless the original data type was a char*. Otherwise a byte[] should do.

Try:

[MarshalAs(UnmanagedType.ByValArray, SizeConst=[whatever your size is]]
byte[] Data;

If you need to convert this to a string later, use:

System.Text.Encoding.UTF8.GetString(your byte array here). 

Obviously, you need to vary the encoding to what you need, though UTF-8 usually is sufficient.

I see the problem now, you have to marshal a VARIABLE length array. The MarshalAs does not allow this and the array will have to be sent by reference.

If the array length is variable, your byte[] needs to be an IntPtr, so you would use,

IntPtr Data;

Instead of

[MarshalAs(UnmanagedType.ByValArray, SizeConst=[whatever your size is]]
byte[] Data;

You can then use the Marshal class to access the underlying data.

Something like:

uint length = yourABSObject.Length;
byte[] buffer = new byte[length];

Marshal.Copy(buffer, 0, yourABSObject.Data, length);

You may need to clean up your memory when you are finished to avoid a leak, though I suspect the GC will clean it up when yourABSObject goes out of scope. Anyway, here is the cleanup code:

Marshal.FreeHGlobal(yourABSObject.Data);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jonathan Henson
  • 8,076
  • 3
  • 28
  • 52
  • just as I did above, substitute the block I included for your section defining Data. Change SizeConst= to the definition of the ABS_VARLEN macro. – Jonathan Henson May 05 '11 at 18:15
  • 3
    The problem is that the the length is based on public uint Length. but I can't fill that in there. – Ezi May 05 '11 at 18:18
  • Thanks, my problem is that I if I convert the object to the type then it losses all the info!. – Ezi May 05 '11 at 18:49
  • can you post some code showing the conversion as an update in your question above? – Jonathan Henson May 05 '11 at 18:50
  • Also, why is a cast needed? Why can't you have your original object declared as type abs_data? – Jonathan Henson May 05 '11 at 18:56
  • 2
    I think your answer would be easier to follow if instead of tacking "updates" onto the end of it, you simply *removed* the irrelevant parts. There's no need to keep a record of past iterations of your response within the question; the edit history already has that in case anyone really wants to know your original answer. – Rob Kennedy May 05 '11 at 18:58
  • If I declare it, it also stays empty! the only way it works is by doing: object obj = Marshal.PtrToStructure(ptr , bir.GetType); //that's good – Ezi May 05 '11 at 19:01
  • Sorry, I thought that I had read that they wanted us to keep the original and tack on updates. I am new. Plus, though the first part misses the point of the question, when I am reading someone else's question and answers, I like to see that "irrelevant" information. – Jonathan Henson May 05 '11 at 19:02
  • can I see the method call returning ptr? Furthermore, how can you tell if obj has data without the cast? It isn't null? But, I thought that you were always getting length back, so it would never be null anyways. – Jonathan Henson May 05 '11 at 19:03
  • It's ok I got it to work thanks you very much. I'm using direct cast. – Ezi May 05 '11 at 19:08
  • @Jonathan: if the C++ declaration were `ABS_BYTE* Data` it would be correct to just write `IntPtr Data`, but with `ABS_BYTE Data[]` the marshaler will do `managed.Data = *(IntPtr*)&unmanaged.Data[0]`. What is needed is `managed.Data = (IntPtr)&unmanaged.Data[0]` and I believe it is impossible to make the marshaler do this. – Anton Tykhyy May 05 '11 at 20:57
  • 3
    @Jonathan: `Marshal.FreeHGlobal` is not called for here, as there is no unmanaged array to free. The array is part of the unmanaged structure declared in the question. If the whole structure needs to be freed that is a separate matter. Anyway unmanaged pointers must be freed via the allocator which was used to allocate them in the first place and there are a lot of unmanaged memory allocators. – Anton Tykhyy May 05 '11 at 21:02
  • @Anton How are the two not the same? when you reference an array, you are referencing the pointer to the array on the heap, not an element on the stack. In c++, if I have a char ary[15]; ary == &ary[0]; How is the same not true here? Also, I wasn't sure about the Marshal.FreeHGlobal, I was just trying to point him in the right direction if he needed to free the memory. Thanks for the clarification. – Jonathan Henson May 05 '11 at 21:20
  • @Jonathan: `*(IntPtr*)&unmanaged.Data[0]` is an IntPtr composed of the first 4 (or 8 in a 64-bit environment) bytes of `unmanaged.Data`. `(IntPtr)&unmanaged.Data[0]` is an IntPtr equal to the pointer to the beginning of `unmanaged.Data`. The point is that the marshaler does not treat IntPtr in any special way, it is just a pointer-sized number. – Anton Tykhyy May 06 '11 at 06:23
  • 1
    @Anton I just don't see a major difference in the two c++ declarations ABS_BYTE* data is the same thing as referencing ABS_BYTE Data[] as Data. Thus, I don't see why they are marshaled differently. I am not doubting that they are, I just don't see why. – Jonathan Henson May 06 '11 at 06:27
  • @Anton, actually it just hit me why. Thanks for the lesson. However, when I tested this, it worked fine. Help me out here. – Jonathan Henson May 06 '11 at 06:32
  • @Jonathan: you're welcome. But to say anything on why whatever works I'd have to see the relevant parts of both managed and unmanaged code. – Anton Tykhyy May 06 '11 at 09:06
3

You are trying to marshal something that is a byte[ABS_VARLEN] as if it were a string of length 1. You'll need to figure out what the ABS_VARLEN constant is and marshal the array as:

[MarshalAs(UnmanagedType.LPArray, SizeConst = 1024)]
public byte[] Data;

(The 1024 there is a placeholder; fill in whatever the actual value of ASB_VARLEN is.)

Michael Edenfield
  • 28,070
  • 4
  • 86
  • 117
  • 3
    ABS_VARLEN means that it may always a different Length. based on public uint Length. – Ezi May 05 '11 at 18:14
  • 2
    There seems to be a lot of questions and misunderstanding about this construct from non-C programmers I would assume. Typically the struct defines the array of 1 to mean "and what will follow here in memory is some arbitrary number of bytes specified by the 'Length' member that you will need to walk through and interpret." The 1 doesn't necessarily mean 1 and there is no known-number of what it could be. It is purely determined at runtime on a case by case basis. – ribram Jun 30 '11 at 02:14
2

In my opinion, it's simpler and more efficient to pin the array and take its address.

Assuming you need to pass abs_data to myNativeFunction(abs_data*):

public struct abs_data
{
    public uint Length;
    public IntPtr Data;
}

[DllImport("myDll.dll")]
static extern void myNativeFunction(ref abs_data data);

void CallNativeFunc(byte[] data)
{
    GCHandle pin = GCHandle.Alloc(data, GCHandleType.Pinned);

    abs_data tmp;
    tmp.Length = data.Length;
    tmp.Data = pin.AddrOfPinnedObject();

    myNativeFunction(ref tmp);

    pin.Free();
}
Benoit Blanchon
  • 13,364
  • 4
  • 73
  • 81