C# Multithread / Optimize iteration over array for interesting values (Min, Max, Sum, Average)

Question

I need to optimize this function which get called frequently:

        private static InterstingDataValues CalculateFor(IData data)
        {
            InterstingDataValues dataValues = new InterstingDataValues(null);
            float[] pixels = data.ReadAsFloatBuffer();
            if (pixels == null)
            {
                return null;
            }
            float value1 = pixels[0];
            if (float.IsNaN(value1))
            {
                return null;
            }

            dataValues.HighestIntensityInData = float.MinValue;
            dataValues.LowestIntensityInData = float.MaxValue;

            for (int i = 0; i < pixels.Length; ++i)
            {
                float pixelf = pixels[i];
                if (float.IsNaN(pixelf))
                {
                    pixelf = 0;
                }
                dataValues.SumIntensity += (uint)pixelf;

                dataValues.HighestIntensityInData = Math.Max(dataValues.HighestIntensityInData, pixelf);
                dataValues.LowestIntensityInData = Math.Min(dataValues.LowestIntensityInData, pixelf);
            }
            dataValues.AverageIntensity = dataValues.SumIntensity / (uint)pixels.Count();

            if (double.IsNaN(dataValues.HighestIntensityInData))
            {
                dataValues.HighestIntensityInData = float.MaxValue;
            }
            if (double.IsNaN(dataValues.LowestIntensityInData))
            {
                dataValues.LowestIntensityInData = 0;
            }
            return dataValues;
        }

I notice C# has inbuilt functions like

pixels.Max() 
pixels.Min() 
pixels.Sum()
pixels.Average()

Which i would assume to be well optimized. However my feeling is calling these separately would be much more inefficient then doing it together.

My current thinking is to send blocks of the array off to separate threads to get min/max/sum. Then when I get the results for the blocks, I can run min,max,sum the results on the blocks.

But i have a feeling that C# will have some inbuilt way of doing this through Parallel.For, but I get worried at answers to this

due to the word "interlocked" I need to do some more digging into that, however I am wondering if I am on the right track.

Thanks, Chris

Did you see the second piece of code in that answer, showing the proper way of doing it? — Lasse V. Karlsen, Apr 13 '15 at 11:24
Nonononononono :D First, figure out where the code is actually spending time. Your guess is probably way off - use a profiler. Second, the only simple thing you can improve in performance I can see is the max and min - you're *always* assigning, whether the value changed or not. That *might* mean a slight difference in performance. But still, profile. Guesses are not going to help you much these days :) This should be pretty easy to parallelize, just avoid shared state - let each of the parallel executions calculate their part, and aggregate that on the end. It might help :) — Luaan, Apr 13 '15 at 11:25
@LasseV.Karlsen, yes I did, but i still got worried at the word interlocked. (x) => Interlocked.Add(ref sum, x). But if im not mistaken, he is doing exactly what i was proposing through that method? — chrispepper1989, Apr 13 '15 at 11:29
@Luaan I did do some profiling, I experimented with caching, and that did help a lot. However the data is changing, so caches would be invalidated often. I should have been clearer. But this is the area that would be best to optimize. — chrispepper1989, Apr 13 '15 at 11:31
I doubt that you can optimize this in any way by going Parallel for (Sum/Min/Max) - instead split the pixel-array in parts and use the method you got to compute those parts in parallel and finally merge them — Random Dev, Apr 13 '15 at 11:33
There's more questions - which part of the method is actually taking the most time? Maybe it's the `ReadAsFloatBuffer`? The loop itself is relatively easy to paralellize (note my answer), other parts might not. And of course, memory throughput might be a limiting factor, in which case the parallelization isn't going to help at all. — Luaan, Apr 13 '15 at 12:14
@CarstenKönig That's what i meant by "My current thinking is to send blocks of the array off to separate threads to get min/max/sum" — chrispepper1989, Apr 13 '15 at 13:34
ReadAsFloatBuffer is definitely slower, but I can't do anything about that, it involves a gpu/cpu synch :( — chrispepper1989, Apr 13 '15 at 13:36

score 1 · Answer 1 · answered Apr 13 '15 at 12:06

1

data.ReadAsFloatBuffer() seem to be the only redundant call, and eliminating it should be your priority. You should lookup the data in your loop instead of copying it to a fixed continuous array.

answered Apr 13 '15 at 12:06

Hylaean

1,237
13
19

Luaan · Accepted Answer · 2015-04-16T15:28:57.707

It is really easy to parallelize the loop, if that really is the part where you need it:

public void Main()
{
    int[] array = new int[] { 12, 15, 18, 64, 3, 68, 32 };
    object sync = new object();

    var results = new List<Result>();

    Parallel.ForEach
        (
            array,
            () => default(Result),
            (item, s, input) => input.Add(item),
            input => { lock (sync) results.Add(input); }
        );

    var aggregatedResult = results.Aggregate((acc, item) => acc.Add(item));

    aggregatedResult.Dump();
}

public struct Result
{
  public readonly int Sum;
  public readonly int? Min;
  public readonly int? Max;

  public Result(int sum, int min, int max)
  {
    Sum = sum;
    Min = min;
    Max = max;
  }

  public Result Add(int item)
  {
    return
        new Result
        (
            Sum + item, 
            Min.HasValue && Min.Value < item ? Min.Value : item, 
            Max.HasValue && Max.Value > item ? Max.Value : item
        );
  }

  public Result Add(Result partialResult)
  {
    return
        new Result
        (
            Sum + partialResult.Sum, 
            Min.HasValue && Min.Value < partialResult.Min 
              ? Min.Value : partialResult.Min.GetValueOrDefault(0), 
            Max.HasValue && Max.Value > partialResult.Max 
              ? Max.Value : partialResult.Max.GetValueOrDefault(0)
        );
  }
}

I'm not saying this is the best way (I particularly don't like the lock there, I'm sure there's a better way of doing that), but it's pretty straight-forward. Note how everything but the aggregation of the final data is parallelized - and aggregating the few structs together is going to be rather cheap compared to doing the same thing across hundreds of thousands of items in the array.

Also note how you can use nullable types to handle the edge cases like "there's no value" - default(float?) is a lot better than using NaN. Try to use the best types possible to describe your scenario.

Also, I'm not sure how smart ForEach is about splitting the array into parts - it might be better to use For instead. The principle stays the same, though.

score 0 · Answer 3 · edited May 23 '17 at 11:53

It may be late for the poster but it might be useful for others. When asking a stupid question, I got a brilliant answer by Corey here: Parallel.For() with Interlocked.CompareExchange(): poorer performance and slightly different results to serial version

Corey drew my attention to the Enumerable.Aggregate function (https://msdn.microsoft.com/en-us/library/system.linq.enumerable.aggregate(v=vs.110).aspx) with which such tasks can be done in parallel without locking. As far as I understand, you define an accumulator class to hold temporary data from the same thread, then define a function to merge partial data from parallel threads. It is best shown with an example. The extension method below searches for minimum and maximum values in an IList<double>:

    public static class Statistics
    {       
        internal class ExtremumAccumulator
        {
            internal double Min;
            internal double Max;
        }

       /// <summary>
       /// An aggregate parallel query to return the minimum and the maximum of <paramref name="data"/> together, faster than two successive parallel queries to minimum and maximum.
       /// </summary>
       /// <param name="data">The list whose extrema we are to find.</param>
       /// <returns>A <see cref="Tuple{double, double}"/> instance whose <see cref="Tuple{double, double}.Item1"/> represents the minimum and whose <see cref="Tuple{double, double}.Item2"/> contains the maximum of <paramref name="data"/>.</returns>
       public static Tuple<double, double> Extrema(this IList<double> data)
       {
           ParallelQuery<double> query = data.AsParallel();

           return query.Aggregate(
              // Initialise accumulator:
              () => new ExtremumAccumulator() { Min = Double.MaxValue, Max = Double.MinValue },
              // Aggregate calculations:
              (accumulator, item) => { if (item < accumulator.Min) accumulator.Min = item; if (item > accumulator.Max) accumulator.Max = item; return accumulator; },
              // Combine accumulators:
              (accumulator1, accumulator2) => new ExtremumAccumulator() { Min = Math.Min(accumulator1.Min, accumulator2.Min), Max = Math.Max(accumulator1.Max, accumulator2.Max) },
              // Get result:
              accumulator => new Tuple<double, double>(accumulator.Min, accumulator.Max)
          );
       }
   }

C# Multithread / Optimize iteration over array for interesting values (Min, Max, Sum, Average)

3 Answers3