12

I would like to get a byte[] from a float[] as quickly as possible, without looping through the whole array (via a cast, probably). Unsafe code is fine. Thanks!

I am looking for a byte array 4 time longer than the float array (the dimension of the byte array will be 4 times that of the float array, since each float is composed of 4 bytes). I'll pass this to a BinaryWriter.

EDIT: To those critics screaming "premature optimization": I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with pinvoke'd win32 API. The optimization occurs since this lessens the number of function calls.

And, with regard to memory, this application creates massive caches which use plenty of memory. I can allocate the byte buffer once and re-use it many times--the double memory usage in this particular instance amounts to a roundoff error in the overall memory consumption of the app.

So I guess the lesson here is not to make premature assumptions ;)

Nick
  • 13,238
  • 17
  • 64
  • 100
  • What do you actually want? Every float cast to a byte, or an array four times longer containing the byte representation of the floats? – Khoth Mar 06 '09 at 14:36
  • What does "four times longer" mean? – ryeguy Mar 06 '09 at 15:04
  • It would help to know what you plan to use the bytes *for* after. The answer yopu accepted is not optimal in several situations if you are willing to use unsafe code... – ShuggyCoUk Mar 06 '09 at 15:35
  • He says in the question... "I'll pass this to a BinaryWriter". – jdmichal Mar 06 '09 at 15:54
  • That's not what he wants, that's how he is trying to achieve what he wants. if this is going into a stream he can do better than binary writer... – ShuggyCoUk Mar 08 '09 at 13:51
  • Nick, check out my answer below. It'll do the job: no iteration, no memory allocations. If you can live with the "hackiness" of it, then go for it. – Omer Mor Aug 29 '10 at 05:33
  • An array of floats into an array of bytes? So 2 floats would take 8 bytes? Or have I misunderstood. – Chris S Mar 06 '09 at 15:11

9 Answers9

22

There is a dirty fast (not unsafe code) way of doing this:

[StructLayout(LayoutKind.Explicit)]
struct BytetoDoubleConverter
{
    [FieldOffset(0)]
    public Byte[] Bytes;

    [FieldOffset(0)]
    public Double[] Doubles;
}
//...
static Double Sum(byte[] data)
{
    BytetoDoubleConverter convert = new BytetoDoubleConverter { Bytes = data };
    Double result = 0;
    for (int i = 0; i < convert.Doubles.Length / sizeof(Double); i++)
    {
        result += convert.Doubles[i];
    }
    return result;
}

This will work, but I'm not sure of the support on Mono or newer versions of the CLR. The only strange thing is that the array.Length is the bytes length. This can be explained because it looks at the array length stored with the array, and because this array was a byte array that length will still be in byte length. The indexer does think about the Double being eight bytes large so no calculation is necessary there.

I've looked for it some more, and it's actually described on MSDN, How to: Create a C/C++ Union by Using Attributes (C# and Visual Basic), so chances are this will be supported in future versions. I am not sure about Mono though.

Davy Landman
  • 15,109
  • 6
  • 49
  • 73
21

Premature optimization is the root of all evil! @Vlad's suggestion to iterate over each float is a much more reasonable answer than switching to a byte[]. Take the following table of runtimes for increasing numbers of elements (average of 50 runs):

Elements      BinaryWriter(float)      BinaryWriter(byte[])
-----------------------------------------------------------
10               8.72ms                    8.76ms
100              8.94ms                    8.82ms
1000            10.32ms                    9.06ms
10000           32.56ms                   10.34ms
100000         213.28ms                  739.90ms
1000000       1955.92ms                10668.56ms

There is little difference between the two for small numbers of elements. Once you get into the huge number of elements range, the time spent copying from the float[] to the byte[] far outweighs the benefits.

So go with what is simple:

float[] data = new float[...];
foreach(float value in data)
{
    writer.Write(value);
}
user7116
  • 63,008
  • 17
  • 141
  • 172
  • I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with win32 API. ;) – Nick Mar 07 '09 at 00:09
  • 1
    Good good, but I would add that unless you're writing millions of floats or executing this thousands of times, ~200ms is an unimportant number in the grand scheme of program execution. – user7116 Mar 07 '09 at 01:07
  • There is a sweet spot at 10,000, 3 times faster (or is it a typo? - should it be 30.34 ms?) - how do you explain that? – Peter Mortensen Jun 18 '17 at 11:02
  • Are you comparing this against `foreach (byte b in bytedata) { writer.Write(b); }`? Because that's a fairly silly compare, the whole reason why you want this to bytes is so you can use `writer.Write(bytedata)` directly, skipping the massive overhead per Write call. Writing 1MB to disk should not take 2 seconds, that's just plain absurd. You'd need a week to write a full PC backup this way. – Peter May 14 '21 at 10:13
17

There is a way which avoids memory copying and iteration.

You can use a really ugly hack to temporary change your array to another type using (unsafe) memory manipulation.

I tested this hack in both 32 & 64 bit OS, so it should be portable.

The source + sample usage is maintained at https://gist.github.com/1050703 , but for your convenience I'll paste it here as well:

public static unsafe class FastArraySerializer
{
    [StructLayout(LayoutKind.Explicit)]
    private struct Union
    {
        [FieldOffset(0)] public byte[] bytes;
        [FieldOffset(0)] public float[] floats;
    }

    [StructLayout(LayoutKind.Sequential, Pack = 1)]
    private struct ArrayHeader
    {
        public UIntPtr type;
        public UIntPtr length;
    }

    private static readonly UIntPtr BYTE_ARRAY_TYPE;
    private static readonly UIntPtr FLOAT_ARRAY_TYPE;

    static FastArraySerializer()
    {
        fixed (void* pBytes = new byte[1])
        fixed (void* pFloats = new float[1])
        {
            BYTE_ARRAY_TYPE = getHeader(pBytes)->type;
            FLOAT_ARRAY_TYPE = getHeader(pFloats)->type;
        }
    }

    public static void AsByteArray(this float[] floats, Action<byte[]> action)
    {
        if (floats.handleNullOrEmptyArray(action)) 
            return;

        var union = new Union {floats = floats};
        union.floats.toByteArray();
        try
        {
            action(union.bytes);
        }
        finally
        {
            union.bytes.toFloatArray();
        }
    }

    public static void AsFloatArray(this byte[] bytes, Action<float[]> action)
    {
        if (bytes.handleNullOrEmptyArray(action)) 
            return;

        var union = new Union {bytes = bytes};
        union.bytes.toFloatArray();
        try
        {
            action(union.floats);
        }
        finally
        {
            union.floats.toByteArray();
        }
    }

    public static bool handleNullOrEmptyArray<TSrc,TDst>(this TSrc[] array, Action<TDst[]> action)
    {
        if (array == null)
        {
            action(null);
            return true;
        }

        if (array.Length == 0)
        {
            action(new TDst[0]);
            return true;
        }

        return false;
    }

    private static ArrayHeader* getHeader(void* pBytes)
    {
        return (ArrayHeader*)pBytes - 1;
    }

    private static void toFloatArray(this byte[] bytes)
    {
        fixed (void* pArray = bytes)
        {
            var pHeader = getHeader(pArray);

            pHeader->type = FLOAT_ARRAY_TYPE;
            pHeader->length = (UIntPtr)(bytes.Length / sizeof(float));
        }
    }

    private static void toByteArray(this float[] floats)
    {
        fixed(void* pArray = floats)
        {
            var pHeader = getHeader(pArray);

            pHeader->type = BYTE_ARRAY_TYPE;
            pHeader->length = (UIntPtr)(floats.Length * sizeof(float));
        }
    }
}

And the usage is:

var floats = new float[] {0, 1, 0, 1};
floats.AsByteArray(bytes =>
{
    foreach (var b in bytes)
    {
        Console.WriteLine(b);
    }
});
Omer Mor
  • 5,216
  • 2
  • 34
  • 39
  • 1
    -1 for being completely non-portable. Have you even tried this on a 64-bit machine? – Gabe Oct 28 '10 at 05:04
  • 2
    nope - it's a hack. If and when I get access to a 64 bit machine, I might check it out and perhaps adapt it. It is also not future proof. In CLR v.Next it might be completely broken. There is a trade-of here: You can use a more robust solution and pay in performance, or use the fastest way I can think of and live on the edge :-) – Omer Mor Oct 29 '10 at 13:50
  • 1
    I got a chance to use this on a 64-bit machine, so I made the code portable. – Omer Mor Jun 28 '11 at 08:25
  • 1
    +1 :-) Thanks for this! I use this method with custom structures, and it is indeed hellza helpful. – Robert Fraser Oct 06 '11 at 06:22
  • 1
    +1 Pretty rad. I must ask, did you find any documentation on the memory layout for the *type* and *length* "fields" (for lack of a better word) of the arrays? I mean, how did you come up with this: `FLOAT_ARRAY = *(UIntPtr*)(((byte*) pFloats) - 2*PTR_SIZE);` ? – Cristian Diaconescu Jun 12 '13 at 14:44
  • Note to self and others: This article gets to the deeper end of the pool regarding internal type representation for .NET 2.0. http://www.codeproject.com/Articles/20481/NET-Type-Internals-From-a-Microsoft-CLR-Perspecti – Cristian Diaconescu Jun 12 '13 at 15:08
  • Thanks. I deduced the array header metadata fields using "reverse engineering" and some trial and error: I opened a memory window in visual studio, tinkered with the values, and deduced the layout. I updated the code to make it a little clearer. – Omer Mor Jun 12 '13 at 21:42
  • 2
    This hack is corrupting the internal garbage collector data structures. It will cause intermittent crashes, data corruptions, and security bugs of the same class as use-after-free in C++. Hacking internal garbage collector data structures like this is absolutely not supported by the .NET runtime. https://github.com/HelloKitty/Reinterpret.Net/issues/1 has a long discussion about the crashes that this hack will lead to. – Jan Kotas Sep 23 '17 at 18:15
  • 1
    @JanKotas thanks for the discussion link. Very interesting! I guess I could pin the array for the entire scope of the As{Float,Byte}Array() functions to prevent such corruptions. What do you think? – Omer Mor Jan 03 '18 at 19:37
  • @OmerMor, I think you are right because (a) the garbage collector won't move it while pinned, and (b) the garbage collector won't traverse it because it is an array of simple values. – Oliver Bock Sep 30 '20 at 07:35
7

If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().

public static void BlockCopy(
    Array src,
    int srcOffset,
    Array dst,
    int dstOffset,
    int count
)

For example:

float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];

Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
Jeremy
  • 3,484
  • 3
  • 22
  • 25
  • 2
    This will double the amount of memory allocation *in addition* to iterating over your *two* arrays (once to copy, once to write). Very inefficient both speed-wise and memory-wise. Not recommended. – vladr Mar 06 '09 at 15:17
  • Doesn't the last parameter need to be multiplied by sizeof(float)? – jdmichal Mar 06 '09 at 15:17
  • Actually, you should probably just use Buffer.ByteLength: http://msdn.microsoft.com/en-us/library/system.buffer.bytelength.aspx – jdmichal Mar 06 '09 at 15:27
  • 2
    You are better off to just iterate over the float[] array and call Write for each float. This solution is highly inefficient. – vladr Mar 06 '09 at 15:28
  • Didn't know about that method, thanks! As for efficiency, whenever I have used BlockCopy, I had a byte[] and needed a float[] so there was no unneeded duplication. Plus if you stick with BlockCopy, you do not need unsafe code which can be advantageous. Pick the best method for your needs. – Jeremy Mar 06 '09 at 15:32
  • @Jeremy: I didn't either, until 5 seconds before that comment :) @Vlad: Please just rate it up or down. No need to repeatedly post the same comment (while advertizing for your answer). Let the asker and the users decide what is helpful. That's why the rating system exists. – jdmichal Mar 06 '09 at 15:37
  • Posted answer which confirms @Vlad's suspicions – user7116 Mar 06 '09 at 15:41
  • @rstevens: you would have to use Marshal.SizeOf(typeof(float)), but the CLI standard says sizeof(float) should be 32bits. – user7116 Mar 06 '09 at 16:46
3

You're better-off letting the BinaryWriter do this for you. There's going to be iteration over your entire set of data regardless of which method you use, so there's no point in playing with bytes.

1

Although you can obtain a byte* pointer using unsafe and fixed, you cannot convert the byte* to byte[] in order for the writer to accept it as a parameter without performing data copy. Which you do not want to do as it will double your memory footprint and add an extra iteration over the inevitable iteration that needs to be performed in order to output the data to disk.

Instead, you are still better off iterating over the array of floats and writing each float to the writer individually, using the Write(double) method. It will still be fast because of buffering inside the writer. See sixlettervariables's numbers.

vladr
  • 65,483
  • 18
  • 129
  • 130
  • Not sure what you mean. I just want byte-level indexing into the floating-point array (actually, I'm passing the array to a Writer). – Nick Mar 06 '09 at 14:48
  • @Vlad: What is this supposed to mean? How can a datatype not be representable as bytes? See my answer. – ryeguy Mar 06 '09 at 14:58
  • it means that the binary representation of (float)0 and that of (byte)0 are not the same (for one they don't have the same size.) – vladr Mar 06 '09 at 15:04
  • Doesn't seem to work: error CS1503: Argument '1': cannot convert from 'byte*' to 'byte[]' – Nick Mar 06 '09 at 15:05
  • Vlad is correct, you cannot fake the bits in memory that consitute a float[] as a byte[]. You CAN get a byte* to the front of the arry which is likely sufficient for your needs but a byte* cannot be magiked into a byte[] – ShuggyCoUk Mar 06 '09 at 15:32
  • Please see my edit which explains why, in my specific case, Jeremy's answer does indeed speed up execution as confirmed by a profiler. – Nick Mar 07 '09 at 00:18
  • Actually you CAN fake the bits in memory to represent a byte[]. Check out my answer to see how it's done. – Omer Mor Aug 29 '10 at 05:37
1

Using the new Span<> in .Net Core 2.1 or later...

byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();

Or, if Span can be used instead, then a direct reinterpret cast can be done: (very fast - zero copying)

Span<byte> byteArray3 = MemoryMarshal.Cast<float, byte>(floatArray);
// with span we can get a byte, set a byte, iterate, and more.
byte someByte = byteSpan[2]; 
byteSpan[2] = 33;

I did some crude benchmarks. The time taken for each is in the comments. [release/no debugger/x64]

float[] floatArray = new float[100];
for (int i = 0; i < 100; i++) floatArray[i] = i *  7.7777f;
Stopwatch start = Stopwatch.StartNew();
for (int j = 0; j < 100; j++)
{
    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        Span<byte> byteSpan = MemoryMarshal.Cast<float, byte>(floatArray);
    }
    long timeTaken1 = start.ElapsedTicks; ////// 0 ticks  //////

    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
    }
    long timeTaken2 = start.ElapsedTicks; //////  26 ticks  //////

    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
        for (int i = 0; i < floatArray.Length; i++)
            BitConverter.GetBytes(floatArray[i]).CopyTo(byteArray, i * sizeof(float));
    }
    long timeTaken3 = start.ElapsedTicks;  //////  1310  ticks //////

    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
        Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
    }
    long timeTaken4 = start.ElapsedTicks;  ////// 33 ticks  //////

    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
        MemoryStream memStream = new MemoryStream();
        BinaryWriter writer = new BinaryWriter(memStream);
        foreach (float value in floatArray)
            writer.Write(value);
        writer.Close();
    }
    long timeTaken5 = start.ElapsedTicks;   ////// 1080 ticks   //////

    Console.WriteLine($"{timeTaken1/10,6} {timeTaken2 / 10,6} {timeTaken3 / 10,6} {timeTaken4 / 10,6} {timeTaken5 / 10,6} ");
}
SunsetQuest
  • 8,041
  • 2
  • 47
  • 42
  • 1
    Neat, hadn't seen that one yet. Though you could also use `.AsBytes()` which might have slightly lower overheads since it doesn't need to validate the destination type & span lengths. – Jeremy Lakeman May 09 '22 at 02:42
0

We have a class called LudicrousSpeedSerialization and it contains the following unsafe method:

    static public byte[] ConvertFloatsToBytes(float[] data)
    {
        int n = data.Length;
        byte[] ret = new byte[n * sizeof(float)];
        if (n == 0) return ret;

        unsafe
        {
            fixed (byte* pByteArray = &ret[0])
            {
                float* pFloatArray = (float*)pByteArray;
                for (int i = 0; i < n; i++)
                {
                    pFloatArray[i] = data[i];
                }
            }
        }

        return ret;
    }
-3

Although it basically does do a for loop behind the scenes, it does do the job in one line

byte[] byteArray = floatArray.Select(
                    f=>System.BitConverter.GetBytes(f)).Aggregate(
                    (bytes, f) => {List<byte> temp = bytes.ToList(); temp.AddRange(f); return temp.ToArray(); });
Jacob Adams
  • 3,944
  • 3
  • 26
  • 42