1

I'm trying to vectorize a n-dimensional array as 1-dimensional array in C# to later ease working using linear indexing (this whatever the type of the elements).

So far I was using Buffer.BlockCopy to do that (and even reshaping from n-dimensions to m-dimensions as long as the number of elements was not changing) but unfortunately I came across having to reshape arrays whose elements are not primitive types (double, single, int) and in this case Buffer.BlockCopy does not work (example array of string or whatever other non primitive type).

Currently the solution I have is to make special-case for non-primitive types:

/// <summary>Vectorize ND-array</summary>
/// <param name="arrayNd">ND-Array to vectorize.</param>
/// <returns>Surface copy as 1D array.</returns>
public static Array Vectorize(Array arrayNd)
{
    // Check arguments
    if (arrayNd == null) { return null; }
    var elementCount = arrayNd.Length;

    // Create 1D array
    var tarray = arrayNd.GetType();
    var telem = tarray.GetElementType();
    var array1D = Array.CreateInstance(telem, elementCount);

    // Surface copy
    if (telem.IsPrimitive)
    {
        // Block copy only works for array whose elements are primitive types (double, single, ...)
        var numberOfBytes = Buffer.ByteLength(arrayNd);
        Buffer.BlockCopy(arrayNd, 0, array1D, 0, numberOfBytes);
    }
    else
    {
        // Slow version for other element types
        // NB: arrayNd.GetValue(...) does not support linear indexing so need to compute indices for each dimension (very slow !!)
        var indices = new int[arrayNd.Rank];
        for (var i = 0; i < elementCount; i++)
        {
            var idx = i;
            for (var d = arrayNd.Rank - 1; d >= 0; d--)
            {
                var l = arrayNd.GetLength(d);
                indices[d] = idx % l;
                idx /= l;
            }

            array1D.SetValue(arrayNd.GetValue(indices), i);
        }
    }

    // Return as 1D
    return array1D;
}

So this works now all types:

var double1D = Vectorize(new double[3, 2, 5]); // Fast BlockCopy
var string1D = Vectorize(new string[3, 2, 5]); // Slow solution

I already have an NEnumerator class of my own to speed up computing indices (instead of using modulo as above) but maybe there is really fast way for just making this sort of "surface memcpy" ?

NB1: I'd like to avoid unsafe code but if it's the only way ...

NB2: I really want to work with System.Array (eventually I'll later do a bunch of T[] Vectorize(T[,,,,] array) overloads but that's not the issue)

DarkBee
  • 16,592
  • 6
  • 46
  • 58
CitizenInsane
  • 4,755
  • 1
  • 25
  • 56
  • Maybe you can come up with something with these: https://learn.microsoft.com/en-us/dotnet/standard/memory-and-spans/, https://stackoverflow.com/questions/52750582/span-and-two-dimensional-arrays – aybe Nov 17 '21 at 14:21
  • 1
    Do you really need this for arbitrary dimensions? I have rarely seen any array with more than 3 dimensions. Note that the array *value* might be a `Vector3` or some custom type. – JonasH Nov 17 '21 at 16:24
  • 1
    And even the 3d arrays I have seen in the wild have been using jagged array for performance and memory handling reasons, i.e. `T[][,]` rather than `T[,,]` – JonasH Nov 17 '21 at 16:28
  • 1
    @JonasH Data I work with are generally 2D to 4D (hardly ever 5D) and I wrote a library of Reshape, Sort, Permute, Mean, Max, etc.. functions in a very generic way (so it's easy to specialize and force to be type safe with `T[]`, `T[,]`, ... up to necessary) ... The idea is not to be extremely fast (`NEnumerator` of my own is quite ok) ... here I was wondering if a fast and ready-to-use `BlockCopy` equivalent exist(ed) for non primitives because it sound pretty obvious data are 1D in memory so should be more fast to just do a buffer-like copy. – CitizenInsane Nov 17 '21 at 17:29
  • I have done the same thing, but for non-primitve types I do a shallow copy with `Array.Copy()` since `Buffer.BlockCopy()` isn't supported. There is also the `Marshal.Copy()` function to be explored. – John Alexiou Nov 18 '21 at 09:21
  • @CitizenInsane - Trying to coax `C#` into a full array language is a noble and interesting endeavor. The addition of `Reshape()` alone inspired by Fortran would be a welcome addition to have. Having gone down the same roads myself, I ended up ditching native multi-dim arrays and creating a custom `Matrix` object with a linear array as a backing store with only support for rank 1 and rank 2 objects. But I can readily define diagonal, upper and lower triangular and symmetric matrices this way. – John Alexiou Nov 18 '21 at 09:37

1 Answers1

2

In my experience, Multidimensional arrays are kind of a pain to work with, in large part since it is so difficult to access the backing data. As far as I know there is no direct way to just copy all the elements for arbitrary types.

Because of this I tend to prefer a custom type for my 2D types that uses a linear array as backing storage, and index like myArray[y * width + x]. With this model the whole exercise becomes a no-op, and you can get a pointer to pass to native code, it works better with serialization etc.

For 3D/4D arrays you could use the same mode, but it seems like the best option for performance is allocate slices independently, i.e. myArray[z][y * width + x], at least for large arrays. I have not worked with 4D arrays, but in general, I would avoid multidimensional arrays if performance is a concern. There might also be libraries out there that might suit your needs, but I'm not aware of any specific one.

However, looking at your code I would expect there to be some possible improvements. You are currently doing N calls to GetLength, modulus & divisions for each element. So I would expect something like this to be a bit faster:

public static Array MultidimensionalToLinear(Array arr)
{
    var rank = arr.Rank;
    var lengths = new int[rank];

    for (int i = 0; i < rank; i++)
    {
        lengths[i] = arr.GetLength(i);
    }

    var linearLength = arr.Length;
    var result = Array.CreateInstance(arr.GetType().GetElementType(), linearLength);
    var index = new int[rank];
    var linearIndex = 0;
    CopyRecursive(0, index, result, ref linearIndex);

    void CopyRecursive(int rank, int[] index, Array result, ref int linearIndex)
    {
        var lastIndex = index.Length - 1;
        if (rank == lastIndex)
        {
            for (int i = 0; i < lengths[lastIndex]; i++)
            {
                index[lastIndex] = i;
                result.SetValue(arr.GetValue(index), linearIndex);
                linearIndex++;
            }
        }
        else
        {
            for (int i = 0; i < lengths[rank]; i++)
            {
                index[rank] = i;
                CopyRecursive(rank +1, index, result, ref linearIndex);
            }
        }
    }
    return result;
}

However, when measuring it seem like the performance improvement is fairly small. Probably due the code in GetValue dominating the runtime.

JonasH
  • 28,608
  • 2
  • 10
  • 23