3

The following is a simple console application written in C++:

#include <iostream>
using namespace std;

int main()
{
    const __int32 length = 4;
    __int32 ints[length] = {1, 2, 3, 4 };
    __int32* intArray = ints;
    __int64* longArray = (__int64*)intArray;
    for (__int32 i = 0; i < length; i++) cout << intArray[i] << '\n';
    cout << '\n';
    for (__int32 i = 0; i < length / 2; i++) cout << longArray[i] << '\n';
    cout << '\n';
    cout << "Press any key to exit.\n";
    cin.get();
}

The program takes an array of 4 32-bit signed integers, and converts it to an array of 2 64-bit signed integers. It is highly efficient since the only operation was to cast the pointer to a different type.

In C#, the equivalent can be done by creating a new array of the target type, and copying the memory for the original array to the target array. This can be achieved very quickly with the use of the System.Runtime.InteropServices.Marshal class. However, this is vastly inefficient for larger arrays due to the overhead of copying many megabytes of data.

Additionally, there are situations where one would wish for two arrays of different unmanaged types to refer to the same location in memory. For instance, performing operations on one array and seeing changes in another.

To be clear, I want to convert the arrays bit by bit, not value by value. If that does not make sense, this was the output from the console:

1
2
3
4

8589934593
17179869187
Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
256Bits
  • 378
  • 1
  • 13
  • 1
    https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.memorymarshal.cast?view=net-7.0 – GSerg Jun 30 '23 at 12:46
  • 6
    This code is a strict aliasing violation in C++. It is not allowed and contains undefined behavior. You cannot access an `__int32` though a `__int64*`. What you're seeing is just a "works on my machine" effect. – Jan Schultke Jun 30 '23 at 12:46
  • 3
    Pretty ironic that C++ in principle doesn't allow this and C# does – harold Jun 30 '23 at 12:49
  • 1
    Also you should not use `__int32`, but `std::int32_t` identifiers starting with`__` are for internal library and compiler use only – Pepijn Kramer Jun 30 '23 at 12:50
  • 1
    @harold Why? They are different languages with different grammars and rules. And this aliasing rule allows for certain compiler optimizations to happen. Also .Net runs on an abstract machine in which things like endianess and memory layout are standardized. C++ must be able to cope with a lot of hardware variance. – Pepijn Kramer Jun 30 '23 at 12:52
  • By the way, you could have received more obvious output by printing in hexadecimal format... Not that it would change anything about the undefined behaviour, though. – Aconcagua Jun 30 '23 at 12:52
  • Side note: About [`using namespace std`](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice)... – Aconcagua Jun 30 '23 at 12:54
  • `It is highly efficient` not really. First, that's not really C++, that's C. Second, the pointer operations means the CPU cache isn't being used. Any assumed performance benefits are eradicated by the orders of magnitude slower RAM access. Finally, it's impossible to use SIMD operations on the array. The .NET runtime already uses SIMD operations in several cases. – Panagiotis Kanavos Jun 30 '23 at 12:55
  • 2
    @PanagiotisKanavos it is still possible to use SIMD, and pointers do not make memory uncacheable – harold Jun 30 '23 at 12:56
  • Actually correct type for specifying array sizes is `size_t`. – Aconcagua Jun 30 '23 at 12:56
  • 2
    @harold C++ allows you to do this, you just need to be explicit about it. For example, you can "type-pun" the array using `std::begin_lifetime_as_array`. You could also use `std::bit_cast` or `std::memcpy` on the fly for individual changes. It would have negative performance impliciations if it was allowed to type pun in general. – Jan Schultke Jun 30 '23 at 12:57
  • @JanSchultke Well, `memcpy` *always* has a performance impact… – Aconcagua Jun 30 '23 at 13:05
  • 1
    @Aconcagua it's actually optimized out in many cases in this context (reinterpreting things), making the source code look really ugly (and not expressing the actual intent of the code) even though the resulting assembly does the right thing (ie actually directly reinterpret something, not copying via an intermediary) – harold Jun 30 '23 at 13:08

1 Answers1

6

You can use Span<T> to do this without copying the array:

int[] source = { 1, 2, 3, 4 };

Span<long> dest = MemoryMarshal.Cast<int, long>(source.AsSpan());

foreach (var element in dest)
{
    Console.WriteLine(element); // Outputs 8589934593 and 17179869187
}

However if you must have the data as an array, you must end up making a copy.

If you can accept unsafe code, this is likely to be slightly faster (but probably not by so much as to make it worth using unsafe code):

int[] source = { 1, 2, 3, 4 };

unsafe
{
    fixed (int* p = source)
    {
        long* q = (long*)p;

        for (int i = 0; i < source.Length/2; i++)
        {
            Console.WriteLine(*q++);
        }
    }
}

Another approach (which is slower but will give you the data in a separate array) is to use Buffer.BlockCopy(). If you can pre-allocate and reuse the destination array, you can save the overhead of allocating the destination - but you still pay to copy all the data.

int[] source = { 1, 2, 3, 4 };
long[] dest = new long[source.Length/2];

Buffer.BlockCopy(source, 0, dest, 0, sizeof(int) * source.Length);

foreach (var element in dest)
{
    Console.WriteLine(element);
}

We should never make performance decisions without benchmarks, so let's try some:

[MemoryDiagnoser]
public class Benchmarks
{
    [Benchmark]
    public void BlockCopy()
    {
        viaBlockCopy();
    }

    static long viaBlockCopy()
    {
        Buffer.BlockCopy(source, 0, dest, 0, sizeof(int) * source.Length);

        long total = 0;

        for (int i = 0; i < dest.Length; ++i)
            total += dest[i];

        return total;
    }

    [Benchmark]
    public void Unsafe()
    {
        viaUnsafe();
    }

    static long viaUnsafe()
    {
        unsafe
        {
            fixed (int* p = source)
            {
                long* q = (long*)p;
                long* end = q + source.Length / 2;

                long total  = 0;

                while (q != end)
                    total += *q++;

                return total;
            }
        }
    }

    [Benchmark]
    public void Span()
    {
        viaSpan();
    }

    static long viaSpan()
    {
        Span<long> result = MemoryMarshal.Cast<int, long>(source.AsSpan());

        long total = 0;

        foreach (var element in result)
        {
            total += element;
        }

        return total;
    }

    static readonly int[]  source = Enumerable.Range(0, 1024 * 1024).ToArray();
    static readonly long[] dest   = new long[1024 * 1024/2];
}

Note that the BlockCopy() benchmark is reusing the dest buffer to avoid the overhead of creating an output array. If your code has to create an output buffer for each call, it would be significantly slower.

And the results:

|    Method |     Mean |   Error |  StdDev | Allocated |
|---------- |---------:|--------:|--------:|----------:|
| BlockCopy | 362.7 us | 3.53 us | 3.30 us |         - |
|    Unsafe | 108.6 us | 0.68 us | 0.57 us |         - |
|      Span | 134.4 us | 0.37 us | 0.33 us |         - |

You can make up your own mind whether unsafe code is worth the extra performance (personally, I avoid unsafe code altogether).

Also note that these benchmarks are including the time to iterate over all the elements of the result. If you omit that part then for the Span and unsafe methods you'll just end up measuring the tiny amount of time needed to "cast" the data.

For completeness, here's the times if you remove the total calculation from the benchmarks (note that the numbers are in nanoseconds rather than microseconds!):

|    Method |            Mean |         Error |        StdDev | Allocated |
|---------- |----------------:|--------------:|--------------:|----------:|
| BlockCopy | 108,173.6835 ns | 1,591.5239 ns | 1,328.9946 ns |         - |
|    Unsafe |       0.9529 ns |     0.0105 ns |     0.0088 ns |         - |
|      Span |       1.1429 ns |     0.0042 ns |     0.0033 ns |         - |

Now you can see why I added in the total calculations...

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276