0

I got a big byte array (around 50kb) and i need to extract numeric values from it. Every three bytes are representing one value. What i tried is to work with LINQs skip & take but it's really slow regarding the large size of the array.

This is my very slow routine:

List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Count(); i+=3)
{
    ints.Add(BitConverter.ToInt16(fullFile.Skip(i).Take(i + 3).ToArray(), 0));
}

I think i got a wrong approach to this.

Elias Johannes
  • 694
  • 2
  • 7
  • 26
  • Does this answer your question? [Splitting a byte\[\] into multiple byte\[\] arrays in C#](https://stackoverflow.com/questions/11816295/splitting-a-byte-into-multiple-byte-arrays-in-c-sharp) You can also try `Span – Pavel Anikhouski Jan 21 '20 at 18:36
  • I am pretty sure your loop enumerates the `IEnumerable` on every iteration – Emanuel Vintilă Jan 21 '20 at 18:57
  • @EmanuelVintilă I'm pretty sure it's not ;) https://stackoverflow.com/questions/2521592/difference-between-ienumerable-count-and-length – jgauffin Jan 21 '20 at 19:24
  • @jgauffin I meant with `Skip` and `Take`. But I guess we'll never know until we see the actual type of the `IEnumerable` in question – Emanuel Vintilă Jan 21 '20 at 19:30

1 Answers1

3

Your code

First of all, ToInt16 only uses two bytes. So your third byte will be discarded.

You can't use ToInt32 as it would include one extra byte.

Let's review this:

fullFile.Skip(i).Take(i + 3).ToArray()

..and take a careful look at Take(i + 3). It says that you want to copy a larger and larger buffer. For instance, when i is on index 32000 you copy 32003 bytes into your new buffer.

That's why the code is quite slow.

The code is also slow since you allocate a lot of byte buffers which will need to be garbage collected. 65535 extra buffers of growing size which would have to be garbage collected.

You could also have done like this:

List<int> ints = new List<int>();
var workBuffer = new byte[4];
for (int i = 0; i <= fullFile.Length; i += 3)
{
    // Copy the three bytes into the beginning of the temp buffer
    Buffer.BlockCopy(fullFile, i, workBuffer, 0, 3);

    // Now we can use ToInt32 as the last byte always is zero
    var value = BitConverter.ToInt32(workBuffer, 0);

    ints.Add(value);
}

Quite easy to understand, but not the fastest code.

A better solution

So the most efficient way is to do the conversion by yourself (bit shifting).

Something like:

List<int> ints = new List<int>();
for (int i = 0; i <= fullFile.Length; i += 3)
{
    // This code assume little endianess 
    var value = (fullFile[i + 2] << 16)
                + (fullFile[i + 1] << 8)
                + fullFile[i];
    ints.Add(value);
}

This code do not allocate anything extra (except the ints), and should be quite fast.

You can read more about Shift operators in MSDN. And about endianess

jgauffin
  • 99,844
  • 45
  • 235
  • 372
  • That's a good lesson on being careful with linq expressions! Your bit shifting code is blazing fast - but right now i'm getting wrong values from it. My squence is like `[0,0,246,0,0,246,0,0,246,...]` so the resulting int should be 246 but if i shift them like this i get a huge int (maybe the bits shifted the wrong way?). By the way my debug machine is Little Endian, i checked with `BitConverter.IsLittleEndian` – Elias Johannes Jan 21 '20 at 19:15
  • 1
    Your file looks like big endian. Move the `+ 2` to the last byte (i.e. reverse the order of the bytes). Alltid lurigt med skumma filformat. – jgauffin Jan 21 '20 at 19:21
  • Yes, you are right, reversing the order gives me the correct values! – Elias Johannes Jan 21 '20 at 19:26