2

I'm looking for a faster alternative to BitConverter:

But! Inside a "hot loop":

//i_k_size = 8 bytes 
while (fs.Read(ba_buf, 0, ba_buf.Length) > 0 && dcm_buf_read_ctr < i_buf_reads)
{
    Span<byte> sp_data = ba_buf.AsSpan();
    for (int i = 0; i < ba_buf.Length; i += i_k_size)
    {
        UInt64 k = BitConverter.ToUInt64(sp_data.Slice(i, i_k_size));
    }
 }

My efforts to integrate a pointer with conversion - made performance worse. Can a pointer be used to maki it faster with span?

Below is the benchmark: pointer 2 array is 2x faster

Actually I want this code to be used instead of BitConverter:

public static int l_1gb = 1073741824;
static unsafe void Main(string[] args)
{
    Random rnd = new Random();
    Stopwatch sw1 = new();
    sw1.Start();
    byte[] k = new byte[8];

    fixed (byte* a2rr = &k[0])
    {
        for (int i = 0; i < 1000000000; i++)
        {
            rnd.NextBytes(k);
            //UInt64 p1 = BitConverter.ToUInt64(k); 
            //time: 10203.824
            //time: 10508.981
            //time: 10246.784
            //time: 10285.889

            //UInt64* uint64ptr = (UInt64*)a2rr;
            //x2 performance !
            UInt64 p2 = *(UInt64*)a2rr;

            //time: 4609.814
            //time: 4588.157
            //time: 4634.494
        }
    }
    Console.WriteLine($"time: {Math.Round(sw1.Elapsed.TotalMilliseconds, 3)}");
}
  • Is `i_k_size` equal to `sizeof(UInt64)`? (i.e. is it 8?) – Matthew Watson Jan 17 '23 at 15:49
  • 2
    As an aside, the inconsistent bracing, unconventional indentation and unconventional variable names are somewhat distracting - at least for me, and I suspect others may feel the same. – Jon Skeet Jan 17 '23 at 15:50
  • @MatthewWatson Yes. i_k_size = 8 bytes – Yurii Palkovskii Jan 17 '23 at 15:51
  • @JonSkeet applied additional spaces, but SO editor leaves much to be desired. a) I see no problems in braces. b) codestyles can be different I think c) I wish you replied to the point of the question – Yurii Palkovskii Jan 17 '23 at 15:53
  • Ouch. You discard return value of `fs.Read` (test but not store) so you can't use the correct size at the end if the last read doesn't copy exactly `ba_buf.Length` bytes. – madreflection Jan 17 '23 at 15:54
  • @madreflection the code presupposes the exact input - so this is intentional – Yurii Palkovskii Jan 17 '23 at 15:56
  • 2
    "I see no problems in braces" - the "brace at end of line" for the `while` loop vs the "brace at start of line" for the `for` loop doesn't seem inconsistent to you? (And as for "I wish you replied to the point of the question" - if my suggestions help to make your question more appealing to 10 other users, isn't that actually more useful?) – Jon Skeet Jan 17 '23 at 15:56
  • 2
    Yes, it "presupposes" it. That's clear. But if you don't get what you expect, you have no error checking so it will silently give you bad data. – madreflection Jan 17 '23 at 15:57
  • @madreflection I corrected this, but - I think you can spend your time more efficiently :-) – Yurii Palkovskii Jan 17 '23 at 15:59
  • 2
    The loss of efficiency was in the pushback you gave. – madreflection Jan 17 '23 at 15:59
  • 1
    @YuriiPalkovskii, you actually cannot assume that you get exactly as many characters as you request from `Stream.Read` in .Net Core 3+ (breaking change from .Net Framework). You *need* to understand how many characters `Read` returned, and possibly keep calling `Read` until you get all your data (or just use a `BinaryReader` on top of your stream, which guarantees it). – Blindy Jan 17 '23 at 16:01
  • @madreflection adding eof condition will result in an additional code - this is a very hot loop - so my architectural choice was _deliberate_ removal of this check. – Yurii Palkovskii Jan 17 '23 at 16:02
  • 1
    I never said to add an EOF condition. You're making far more assumptions that you even know. – madreflection Jan 17 '23 at 16:03
  • Dear @Blindy this is a closed test-only environment. - So far 4 bln iterations give exactly the requested bytes – Yurii Palkovskii Jan 17 '23 at 16:04
  • It only needs to fail once, and it *will* fail. It is not theoretical, you will crash on this eventually. This is the very definition of a brittle piece of software. – Blindy Jan 17 '23 at 16:05
  • 3
    How do you know? If `fs.Read` returns more than `0` but less than `ba_buf.Length`, you have no idea because you haven't saved the actual count of bytes read. – madreflection Jan 17 '23 at 16:05
  • @madreflection - u r right (I checked the func desc). will redo the code. – Yurii Palkovskii Jan 17 '23 at 16:09
  • 3
    Imagine the greater efficiency if you had done that after my *first* comment. And Blindy's. – madreflection Jan 17 '23 at 16:11

2 Answers2

5

Assuming ba_buf is a byte[], a very easy and efficient way to run your loop is as such:

foreach(var value in MemoryMarshal.Cast<byte, ulong>(ba_buf))
   // work with value here 

If you need to finesse the buffer (for example, to cut off parts of it), use AsSpan(start, count) on it first.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • 4 seconds faster (my test setup) than BitConverter.ToUInt64 for 1073741824 conversions. Still - I'm not sure if this faster than pointer-span (if possible) – Yurii Palkovskii Jan 17 '23 at 18:01
  • I've added a BM code to support my oppinion. – Yurii Palkovskii Jan 17 '23 at 18:08
  • 2
    Don't need opinions, you can [check the code directly](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAJI8YUCAAcAymIBuvMDFwBuGjWIBmJqQYBhBjQDeNBqaZaOAGwhcA5gwCyACjYxsAEwDyXSwE8pEthcADzAvhgwAHwM7tgY2ACUJmbG1GbpDLLYUAy4HPgMALwMdACqADKqaRmmAGbQbmAAFk5ZOVmWHDAMvFyOMPjQvg7ZuE3Yliy6eBih4TBoDFY2tpFOsfEJSdU16XkFANTFHV1Vu0wA7Ln5Z2YAvjR3QA==). This is as fast as it gets. Any other performance problems come from questionable decisions like using streams for very large files, especially with tiny reads. – Blindy Jan 17 '23 at 18:10
  • whoa! online disasm is fantastic! Never seen that before! Thanks for the link! – Yurii Palkovskii Jan 17 '23 at 18:19
2

You can optimise this quite a lot by initialising some spans outside the reading loop and then read directly into a Span<byte> and access the data via a Span<ulong> like so:

int buf_bytes = sizeof(ulong) * 1024; // Or whatever buffer size you need.
var ba_buf    = new byte[buf_bytes];
var span_buf  = ba_buf.AsSpan();
var data_span = MemoryMarshal.Cast<byte, ulong>(span_buf);

while (true)
{
    int count = fs.Read(span_buf) / sizeof(ulong);

    if (count == 0)
        break;

    for (int i = 0; i < count; i++)
    {
        // Do something with data_span[i]

        Console.WriteLine(data_span[i]); // Put your own processing here.
    }
}

This avoids memory allocation as much as possible. It terminates the reading loop when it runs out of data, and if the number of bytes returned is not a multiple of sizeof(ulong) it ignores the extra bytes.

It will always read all the available data, but if you want to terminate it earlier you can add code to do so.

As an example, consider this code which writes 2,000 ulong values to a file and then reads them back in using the code above:

using (var output = File.OpenWrite("x"))
{
    for (ulong i = 0; i < 2000; ++i)
    {
        output.Write(BitConverter.GetBytes(i));
    }
}

using var fs = File.OpenRead("x");

int buf_bytes = sizeof(ulong) * 1024; // Or whatever buffer size you need.
var ba_buf    = new byte[buf_bytes];
var span_buf  = ba_buf.AsSpan();
var data_span = MemoryMarshal.Cast<byte, ulong>(span_buf);

while (true)
{
    int count = fs.Read(span_buf) / sizeof(ulong);

    if (count == 0)
        break;

    for (int i = 0; i < count; i++)
    {
        // Do something with data_span[i]

        Console.WriteLine(data_span[i]); // Put your own processing here.
    }
}
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • 1
    I'm sorry - at first, I've missed the most important point in your code - data_span + MemoryMarshal.Cast I've tested and bm-ed your approach - and it is so far the _most_ performant bm on 2147483648 iterations: 1. my code with BitConverter = 192 seconds 2. foreach(var value in MemoryMarshal.Cast(ba_buf)) = 187 seconds 3. MemoryMarshal.Cast + outer buffers spans (your code) 183 seconds Thank you so much! – Yurii Palkovskii Jan 17 '23 at 19:05