2

I am trying to improve the usability of an open source C# API that wraps a C library. The underlying library pulls multiplexed 2D data from a server over a network connection. In C, the samples come out as a pointer to the data (many types are supported), e.g. float*. The pull function returns the number of data points (frames * channels, but channels is known and never changes) so that the client knows how much new data is being passed. It is up to the client to allocate enough memory behind these pointers. For example, if one wants to pull floats the function signature is something like:

long pull_floats(float *floatbuf);

and floatbuf better have sizeof(float)*nChannels*nMoreFramesThanIWillEverGet bytes behind it.

In order to accommodate this, the C# wrapper currently uses 2D arrays, e.g. float[,]. The way it is meant to be used is a literal mirror to the C method---to allocate more memory than one ever expects to these arrays and return the number of data points so that the client knows how many frames of data have just come in. The underlying dll handler has a signature like:

 [DllImport(libname, CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi, ExactSpelling = true)]
        public static extern uint pull_floats(IntPtr obj, float[,] data_buffer);

And the C# wrapper itself has a definition like:

int PullFloats(float[,] floatbuf)
{
    // DllHandler has the DllImport code
    // Obj is the class with the handle to the C library
    uint res = DllHandler.pull_floats(Obj, floatbuf);
    return res/floatbuf.GetLength(1); 
}

The C++ wrapper for this library is idiomatic. There, the client supplies a vector<vector<T>>& to the call and in a loop, each frame gets pushed into the multiplexed data container. Something like:

void pull_floats_cpp(std::vector<std::vector<float>>& floatbuf)
{
    std::vector<float> frame;
    floatbuf.clear();
    while(pull_float_cpp(frame)) //  C++ function to pull only one frame at a time
    {
       floatbuf.push_back(frame); // (memory may be allocated here)
    }
}

This works because in C++ you can pun a reference to a std::vector to a primitive type like float*. That is, the vector frame from above goes into a wrapper like:

void pull_float_cpp(std:vector<float>& frame)
{
    frame.resize(channel_count); // memory may be allocated here as well...
    pull_float_c(&frame[0]);
}

where pull_float_c has a signature like:

void pull_float_c(float* frame);

I would like to do something similar in the C# API. Ideally the wrapper method would have a signature like:

void PullFloats(List<List<float>> floatbuf);

instead of

int PullFloats(float[,] floatbuf);

so that clients don't have work with 2D arrays and (more importantly) don't have to keep track of the number of frames they get. That should be inherent to the dimensions of the containing object so that clients can use enumeration patterns and foreach. But, unlike C++ std::vector, you can't pun a List to an array. As far as I know ToArray allocates memory and does a copy so that not only is there memory being allocated, but the new data doesn't go into the List of Lists that the array was built from.

I hope that the psuedocode + explanation of this problem is clear. Any suggestions for how to tackle it in an elegant C# way is much appreciated. Or, if someone can assure me that this is simply a rift between C and C# that may not be breached without imitating C-style memory management, at least I would know not to think about this any more.

Could a MemoryStream or a Span help here?

dmedine
  • 1,430
  • 8
  • 25
  • I don't have a direct answer now but have you looked at C++/CLI yet? That allows you to write a wrapper in managed C++ so you can directly expose the .Net types you want. and you can copy data from unmanaged to managed memory in your C++/CLI and have full control there. (Short introduction : https://www.codeproject.com/Articles/19354/Quick-C-CLI-Learn-C-CLI-in-less-than-10-minutes) – Pepijn Kramer Jan 18 '22 at 05:30
  • Can't you just write a class that contains `float[,]` and tracks the size by itself? Also `vector>` is not a good API for the task as here all vectors are of the same size and one can store it in a contiguous 2d array rather than perform allocation for each sub-array separately. – ALX23z Jan 18 '22 at 05:33
  • @ALX23z I didn't choose vector `>` but I think it was done more for usability than performance. The C++ API also has templates for `float*` type instead of the `std::vector` containers. In any case, I use the library all the time and have never noticed any bottleneck in this part of the code. I assumed that most compilers/runtimes/GCs would be intelligent enough not to do a free/alloc at every pass in this pattern. – dmedine Jan 18 '22 at 23:01
  • @PepijnKramer I will check that out. C++/CLI may be exactly what I was looking for. The wrapper class for `float[,]` is an interesting idea too. In fact I started to play around with a design like this already, but in the end I would prefer that the client to have access to the data in something generic and familiar like `List`. – dmedine Jan 18 '22 at 23:04
  • @ALX23z, I should add that these arrays typically aren't ever very large. It could be the case that there is usually enough stack memory available to keep this moving quick. – dmedine Jan 19 '22 at 00:08
  • `std::vector` doesn't use any short vector optimizations. It necessarily uses dynamic memory allocation. There are very specific rare cases where it can be optimized but you have this data as an output from hidden function. So 0% for any optimizations. If their size is small then it is a typical case of memory fragmentation. It might not be an issue if you don't use it much. – ALX23z Jan 19 '22 at 02:24
  • Also, the outer vector is as long as the number of frames. This may move around a bit but will usually be about the same each time. So this memory gets reused and even though `clear()` gets called, if I understand correctly, the GC isn't going to free the memory each time. Maybe I am misunderstanding this, though. The inner vectors will incur a free/alloc penalty each time, though. – dmedine Jan 19 '22 at 03:27
  • @dmedine yep. But what does one alloc/free mean when one does alloc/free per element? If channel count is compile-time fixed then you can replace the inner `std::vector` with `std::array`. – ALX23z Jan 19 '22 at 10:10
  • @ALX23z, unfortunately the channel count depends on runtime conditions. – dmedine Jan 19 '22 at 23:37

1 Answers1

0

I came up with a pretty satisfactory way to wrap pre-allocated arrays in Lists. Please anyone let me know if there is a better way to do this, but according to this I think this is about as good as it gets---if the answer is to make a List out of an array, anyway. According to my debugger, 100,000 iterations of 5000 or so floats at a time,takes less than 12 seconds (which is far better than the underlying library demands in practice, but worse than I would like to see), the memory use stays flat at around 12 Mb (no copies), and the GC isn't called until the program exits:

using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;

namespace ListArrayTest
{
    [StructLayout(LayoutKind.Explicit, Pack = 2)]
    public class GenericDataBuffer
    {
        [FieldOffset(0)]
        public int _numberOfBytes;
        [FieldOffset(8)]
        private readonly byte[] _byteBuffer;
        [FieldOffset(8)]
        private readonly float[] _floatBuffer;
        [FieldOffset(8)]
        private readonly int[] _intBuffer;

        public byte[] ByteBuffer => _byteBuffer;
        public float[] FloatBuffer => _floatBuffer;
        public int[] IntBuffer => _intBuffer;

        public GenericDataBuffer(int sizeToAllocateInBytes)
        {
            int aligned4Bytes = sizeToAllocateInBytes % 4;
            sizeToAllocateInBytes = (aligned4Bytes == 0) ? sizeToAllocateInBytes : sizeToAllocateInBytes + 4 - aligned4Bytes;
            // Allocating the byteBuffer is co-allocating the floatBuffer and the intBuffer
            _byteBuffer = new byte[sizeToAllocateInBytes];
            _numberOfBytes = _byteBuffer.Length;
        }

        public static implicit operator byte[](GenericDataBuffer genericDataBuffer)
        {
            return genericDataBuffer._byteBuffer;
        }
        public static implicit operator float[](GenericDataBuffer genericDataBuffer)
        {
            return genericDataBuffer._floatBuffer;
        }
        public static implicit operator int[](GenericDataBuffer genericDataBuffer)
        {
            return genericDataBuffer._intBuffer;
        }


    }

    public class ListArrayTest<T>
    {
        private readonly Random _random = new();
        const int _channels = 10;
        const int _maxFrames = 500;
        private readonly T[,] _array = new T[_maxFrames, _channels];
        private readonly GenericDataBuffer _genericDataBuffer;
        int _currentFrameCount;
        public int CurrentFrameCount => _currentFrameCount;

        // generate 'data' to pull
        public void PushValues()
        {
            int frames = _random.Next(_maxFrames);
            if (frames == 0) frames++;
            for (int ch = 0; ch < _array.GetLength(1); ch++)
            {
                for (int i = 0; i < frames; i++)
                {
                    switch (_array[0, 0]) // in real life this is done with type enumerators
                    {
                        case float: // only implementing float to be concise
                            _array[i, ch] = (T)(object)(float)i;
                            break;
                    }
                }
            }
            _currentFrameCount = frames;
        }

        private void CopyFrame(int frameIndex)
        {
            for (int ch = 0; ch < _channels; ch++)
                switch (_array[0, 0]) // in real life this is done with type enumerators
                {
                    case float: // only implementing float to be concise
                        _genericDataBuffer.FloatBuffer[ch] = (float)(object)_array[frameIndex, ch];
                        break;
                }
        }

        private void PullFrame(List<T> frame, int frameIndex)
        {

            frame.Clear();
            CopyFrame(frameIndex);
            for (int ch = 0; ch < _channels; ch++)
            {
                switch (frame)
                {
                    case List<float>: // only implementing float to be concise
                        frame.Add((T)(object)BitConverter.ToSingle(_genericDataBuffer, ch * 4));
                        break;
                }
            }
        }

        public void PullChunk(List<List<T>> list)
        {
            list.Clear();
            List<T> frame = new();
            int frameIndex = 0;
            while (frameIndex != _currentFrameCount)
            {
                PullFrame(frame, frameIndex);
                list.Add(frame);
                frameIndex++;
            }
        }

        public ListArrayTest()
        {
            switch (_array[0, 0])
            {
                case float:
                    _genericDataBuffer = new(_channels * 4);
                    break;
            }
        }
    }


    internal class Program
    {
        static void Main(string[] args)
        {
            ListArrayTest<float> listArrayTest = new();
            List<List<float>> chunk = new();
            for (int i = 0; i < 100; i++)
            {
                listArrayTest.PushValues();
                listArrayTest.PullChunk(chunk);
                Console.WriteLine($"{i}: first value: {chunk[0][0]}");
            }
        }
    }
}

Update

...and, using a nifty trick I found from Mark Heath (https://github.com/markheath), I can effectively type pun List<List<T>> back to a T* the same way as does the C++ API with std::vector<std::vector<T>> (see class GenericDataBuffer). It is a lot more complicated under the hood since one must be so verbose with type casting in C#, but it compiles without complaint and it works like a charm. Here is the blog post I stole the idea from: https://www.markheath.net/post/wavebuffer-casting-byte-arrays-to-float.

This also lets me ditch the need for clients being responsible to pre-allocate, at the cost of (as in the C++ wrapper) of having to do a bit of dynamic allocation internally. According to the debugger the GC doesn't get called and the memory stays flat, so I guess the Lists allocations are not relying on digging into the heap.

dmedine
  • 1,430
  • 8
  • 25