0

I've got an array containing millions of bytes. These bytes are int values (Int16, Int24 or Int32). Now I want to get the x-bytes with the max int value out of an amount of bytes.

So to explain this better, lets imagine an array with 10 entries:

byte[] arr = {255, 10, 55, 60, 128, 90, 88, 66, 199, 56};

I will know if we use In16, Int24 or Int32, so for this example, lets imagine we are using Int16. This means, we use 2 bytes to represent an Int16. So the Ints consist of:

{255, 10},
{55, 60},
{128, 90},
{88, 66},
{199, 56}

Problem1: Because this is needed for audio processing, 1046 is lower than -2096. So there is a need to compare independent of negativity

Problem2: Because this needs to be very performant, converting the bytes into Ints for comparing seems inefficient and there should be an other way.

This is the starting point:

    /// <summary>
    /// Gets the maximum value of a number of bytes representing Int-Values
/// </summary>
/// <returns>The channels.</returns>
/// <param name="leftChannel">Left channel.</param>
/// <param name="rightChannel">Right channel.</param>
/// <param name="bytesPerInt">Bytes per int. 2 bytes = Int16, 3 bytes = Int24, 4 bytes = Int32</param>
/// <param name="countBytesToCombine">The number of bytes to look for the highest value</param>
private (byte[] combinedLeft, byte[] combinedRight) CombineChannels(byte[] leftChannel, byte[] rightChannel, int bytesPerInt, int countBytesToCombine)
{

}

/// <summary>
/// Gets the highest byte[] value 
/// </summary>
/// <returns>The highest value. The size of the byte array is equal the bytesPerInt</returns>
/// <param name="bytes">A subarray of the given byte array of the upper method. The size of this array is equals countBytesToCombine</param>
/// <param name="bytesPerInt">The count of bytes representing an Int</param>
private byte[] GetHighestValue(byte[] bytes, int bytesPerInt)
{

}

Edit2

This is a working solution but it takes about 2 seconds to execute with 14 million bytes for each channel which is way too far.

    /// <summary>
    /// Gets the maximum value of a number of bytes representing Int-Values
    /// </summary>
    /// <returns>The channels.</returns>
    /// <param name="leftChannel">Left channel.</param>
    /// <param name="rightChannel">Right channel.</param>
    /// <param name="bytesPerInt">Bytes per int. 2 bytes = Int16, 3 bytes = Int24, 4 bytes = Int32</param>
    /// <param name="countValuesToCombine">The number of bytes to look for the highest value</param>
    private (byte[] combinedLeft, byte[] combinedRight) CombineChannels(byte[] leftChannel, byte[] rightChannel, int bytesPerInt, int countValuesToCombine)
    {
        var cLeft = new List<byte>();
        var cRight = new List<byte>();

        for (int i = 0; i < leftChannel.Length; i += countValuesToCombine * bytesPerInt)
        {
            var arrLeft = SubArray(leftChannel, i, countValuesToCombine * bytesPerInt);
            var arrRight = SubArray(rightChannel, i, countValuesToCombine * bytesPerInt);

            cLeft.AddRange(GetHighestValue(arrLeft, bytesPerInt));
            cRight.AddRange(GetHighestValue(arrRight, bytesPerInt));
        }

        return (cLeft.ToArray(), cRight.ToArray());
    }

    /// <summary>
    /// Gets the highest byte[] value 
    /// </summary>
    /// <returns>The highest value.</returns>
    /// <param name="bytes">Bytes.</param>
    /// <param name="bytesPerInt">The count of bytes representing an Int</param>
    private byte[] GetHighestValue(byte[] bytes, int bytesPerInt)
    {
        byte[] bytesOfHighestValue = new byte[bytesPerInt];

        for (int i = 0; i < bytes.Length; i += bytesPerInt)
        {
            var arr = SubArray(bytes, i, bytesPerInt);

            if (IsValueHigher(arr, bytesOfHighestValue, bytesPerInt))
            {
                bytesOfHighestValue = arr;
            }
        }

        return bytesOfHighestValue;
    }

    private bool IsValueHigher(byte[] one, byte[] two, int bytesPerInt)
    {
        var o = ConvertToInt(one, bytesPerInt);
        var t = ConvertToInt(two, bytesPerInt);

        return Math.Abs(o) > Math.Abs(t);
    }

    private int ConvertToInt(byte[] bytes, int bytesPerInt)
    {
        switch (bytesPerInt)
        {
            case 2:
                return BitConverter.ToInt16(bytes, 0);
            case 3:
                return Int24.ToInt32(bytes, 0);
            case 4:
                return BitConverter.ToInt32(bytes, 0);
        }

        return 0;
    }

This is extremely difficult to explain so please ask if there are questions before downvoting.

DirtyNative
  • 2,553
  • 2
  • 33
  • 58
  • shouldn't `GetHighestValue` return an Int16 for example? – Ashkan Mobayen Khiabani Oct 07 '18 at 15:19
  • This method is an example which will be called inside `CombineChannels`. If there is no performance lag, this can also return an Int16 or else. But I thought, before converting and opening a new bottleneck, lets return the 2 bytes representing this Int16 value – DirtyNative Oct 07 '18 at 15:23
  • well if you want to return byte[] then it will be byte[][] or list of byte[][], for example byte[10] would become List wich has five items. – Ashkan Mobayen Khiabani Oct 07 '18 at 15:25
  • I think I understand what your problem is but I do not know what you are asking. Have you tried solving the problem yourself? Or are you just asking on how to approach the problem from a performance perspective? – a-ctor Oct 07 '18 at 15:38
  • I was able to solve this by myself but it was so unperformant that I deleted the code, so I wanted some help to make this usable in a real world scenario. So yes, I need help with the performance. – DirtyNative Oct 07 '18 at 15:41
  • @HABO This was the way I did earlier but this needs the bytes to be converted to an Int, and this causes performance lags. – DirtyNative Oct 07 '18 at 15:44
  • "1046 is lower than -2096" Explain. Is this a "unsigned value seen as signed value" problem? Because that can be solved simply by not using signed values... – Nyerguds Oct 07 '18 at 16:47
  • @Nyerguds I need to get the max amplitude. An amplitude is positive or negative. The higher the value, the louder is the part of an audiofile. Keeping this in mind, it is irrelevant if the value is positive or negative, -2096 is always higher (louder) than 1046 – DirtyNative Oct 07 '18 at 16:53
  • So it's just absolute value you need. I see. In terms of optimisation, that's simply ignoring / clearing the highest bit. – Nyerguds Oct 07 '18 at 17:00
  • Did you try using `System.IO.BinaryReader`? It has specific methods for reading a `Byte`, `Int16` and `Int32`, and is specifically specced as reading everything as little-endian. No conversion needed, and it automatically advances its read pointer. – Nyerguds Oct 11 '18 at 19:24

3 Answers3

1

Ok so here is a straightforward implementation for 4 byte integers:

private static int GetHighestValue(byte[] data)
{
  if (data.Length % 4 != 0)
     throw new ArgumentException();

  var maximum = 0, maximumAbs = 0;
  for (var i = 0; i < data.Length; i+=4)
  {
    var current = BitConverter.ToInt32 (data, i);
    var currentAbs = Math.Abs(current);

    if (currentAbs > maximumAbs)
    {
      maximum = current;
      maximumAbs = currentAbs;
    }
  }

  return maximum;
}

Running this on a byte[] with 1 million bytes it takes about 3ms while compiling with Debug.

I do not know what kind of speeds you are aiming at but for 99% of cases this should be fine.


Edit: Since you updated your question and included sample code here is an update:

These are some areas I that make your code slower than it needs to be:

  • We do not need to create sub arrays in every iteration of CombineChannels. We can rewrite GetHighestValue so that it takes the array, offset and amount as parameter.

  • Instead of having one CombineChannels method we should split it up into the different byte sizes. For example CombineChannelsInt32, CombineChannelsInt16 ... This way the methods itself can store the maximum as int32/int16/... without having to convert them at every iteration.

So here are the methods we would end up with something like this:

(byte[] combinedLeft, byte[] combinedRight) CombineChannels(byte[] leftChannel, byte[] rightChannel, int bytesPerInt, int countValuesToCombine)
{
  switch(bytesPerInt)
  {
    case 2:
      return CombineChannelsInt16(leftChannel, rightChannel, countValuesToCombine);
    case 3:
      return CombineChannelsInt24(leftChannel, rightChannel, countValuesToCombine);
    case 4:
      return CombineChannelsInt32(leftChannel, rightChannel, countValuesToCombine);
  }
}

(byte[] combinedLeft, byte[] combinedRight) CombineChannelsInt16(byte[] leftChannel, byte[] rightChannel, int countValuesToCombine);
(byte[] combinedLeft, byte[] combinedRight) CombineChannelsInt24(byte[] leftChannel, byte[] rightChannel, int countValuesToCombine);
(byte[] combinedLeft, byte[] combinedRight) CombineChannelsInt32(byte[] leftChannel, byte[] rightChannel, int countValuesToCombine);

short GetHighestValueInt16(byte[] bytes, int offset, int amount);
Int24 GetHighestValueInt24(byte[] bytes, int offset, int amount);
int GetHighestValueInt32(byte[] bytes, int offset, int amount);
a-ctor
  • 3,568
  • 27
  • 41
0

I made a method what returns the max index. It compares the highest bytes first and when equal goes to the lower. With bigger ints it works even faster.

static int getMaxIndex(byte[] data, int byteLenght)
        {
            int MaxIndex = 0;
            int signMax = data[byteLenght - 1] >> 7;// get sign
            for (int i = byteLenght; i < data.Length; i += byteLenght)
            {
                int step = byteLenght - 1;
                int compResult = 0;

                while (compResult == 0 && step > -1)
                {
                    if (step == byteLenght -1)
                    {
                        int signData = data[i + step] >> 7;

                        compResult = signData - signMax;
                        if (compResult == 0) compResult = data[MaxIndex + step] & 127 - data[i + step] & 127;
                    }
                    else compResult = data[MaxIndex + step] - data[i + step];
                    if (compResult < 0)
                    {
                        MaxIndex = i;
                        signMax = data[MaxIndex + step] >> 7;
                    }
                    step--;
                }
            }
            return MaxIndex;
        }
Aldert
  • 4,209
  • 1
  • 9
  • 23
  • This method does not match the **greater than** check I mentioned earlier. 11111111 11111111 is interpreted as the highest number, but actually this is -1 – DirtyNative Oct 07 '18 at 19:00
  • But than you do want to take negativity into account. You can adjust the method by looking at highest bit. – Aldert Oct 07 '18 at 19:03
0

As was mentioned a few times already, avoid "if" statements inside your read; just make a separate function for reading Int16, Int24 and Int32, and select which one to use in advance.

Personally I'd use the System.IO.BinaryReader for this; it already contains functions for reading integers off streams, and unlike BitConverter, which technically depends on system endianness, BinaryReader is actually guaranteed to read values as little-endian; it's in the MSDN specs.

Here is the basic function to use the BinaryReader, using Int32 as example. In this version I let the EndOfStreamException take care of the end. They say exception throwing / handling is quite a heavy operation, but in this case it replaces a lot of checks between reads, so it might be justified.

You could adapt that by replacing the while (true) with an actual check on the stream pointer. It's either just checking ms.Position against the input byte array's length, or keeping track of the location in your own variable you increment by the read amount of bytes in each step.

public static Int32 GetHighestValueInt32(Byte[] bytes)
{
    Int32 maxval = 0;
    try
    {
        using (MemoryStream ms = new MemoryStream(bytes))
        using (BinaryReader reader = new BinaryReader(ms))
        {
            while (true)
            {
                // Clear highest bit so the value's always a positive Int32.
                Int32 val = (Int32)(reader.ReadUInt32() & 0x7FFFFFFF);
                if (val > maxval)
                    maxval = val;
            }
        }
    }
    catch (EndOfStreamException ex)
    {
        // Finished reading!
    }
    return maxval;
}

For Int16, the actual line reading val should simply be replaced by

Int16 val = (Int16)(reader.ReadUInt16() & 0x7FFF);

And maxval and the return type should likewise be changed to Int16.

BinaryReader can't natively read an Int24 off the stream, though. But the workaround for that isn't too hard. You can simply use Int32 and shift it down by 8 bits, and then adapt the stream pointer manually to compensate for the two extra read bytes:

while (true)
{
    Int32 val = (Int32)((reader.ReadUInt32() >> 8) & 0x7FFFFF);
    ms.Position -= 2;
    if (val > maxval)
        maxval = val;
}
Nyerguds
  • 5,360
  • 1
  • 31
  • 63