0

I am fairly new to parallel programming; I am developing a program that deals with numbers in different bases and I want to parallelize the addition method.

Firstly, it computes the linear convolution and does the carrying later, thus every iteration of the for-loop is independent (Example):

(Little Endianness btw) [2,7,1] + [8,7,5] = [10,14,6] => [0,5,7]

The question is, can I, if there are threads available, make the addition process faster by having the iterations done at the same time in different threads, and how?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • 2
    Don't confuse "asynchronous" with "parallel". Asynchronous is freeing up a thread while you wait for some external task to complete (network request, etc.). It's about how your code *waits*. Parallel means running two parts of your code at the same time on multiple threads. It's about how your code *runs*. You want parallel programming. – Gabriel Luci Jan 24 '22 at 19:29
  • Don't confuse "parallel" (which is what multiprocessing hardware offers) with "concurrent" (which is what _threads_ offer - in that different logical executors can maintain independent state permitting them to execute at the same time correctly.) – Wyck Jan 24 '22 at 19:46

2 Answers2

1

In case of huge arrays (threading has its own overhead), you can try Parallel, e.g. Parallel.For:

  int[] left  = ...
  int[] right = ...
  int[] result = new int[left.Length];

  ...

  Parallel.For(0, left.Length, i => result[i] = left[i] + right[i]);

Let's have a look on the effect:

  int N = 100_000_000;

  int[] left   = new int[N];
  int[] right  = new int[N];
  int[] result = new int[left.Length];

  // To prevent garbage collection while testing
  GC.Collect(2);

  Stopwatch sw = new Stopwatch();

  sw.Start();

  // Parallel version
  //Parallel.For(0, left.Length, i => result[i] = left[i] + right[i]);

  // Standard for loop version
  for (int i = left.Length - 1; i >= 0; --i)
    result[i] = left[i] + right[i];

  sw.Stop();

  Console.Write(sw.ElapsedMilliseconds);

Outcome (.Net 6 IA-64, Realease build, Core i9, 6 cores)

  200 - parallel version
  500 - for loop version
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
1

When you have granular work to do, like adding two integers, and you use the Parallel.For method, you'll find that the synchronization overhead, as well as the overhead of invoking a non-inlinable lambda for each index, negates any performance gained by the paralellization. In this case it's a good idea to chunkify the workload by operating on ranges of indices, instead of one index at a time. Here is how you can use the Parallel.ForEach+Partitioner.Create methods for this purpose:

int[] left = new int[1_000_000];
int[] right = new int[1_000_000];
int[] sum = new int[1_000_000];

ParallelOptions parallelOptions = new()
{
    MaxDegreeOfParallelism = Environment.ProcessorCount
};

Partitioner<Tuple<int, int>> partitioner = Partitioner.Create(0, left.Length);
    
Parallel.ForEach(partitioner, parallelOptions, (Tuple<int, int> range) =>
{
    for (int i = range.Item1; i < range.Item2; i++)
    {
        sum[i] = left[i] + right[i];
    }
});

The Partitioner.Create creates by default about three times as many ranges as the Environment.ProcessorCount value, so on a quad core machine it will create about 12 ranges in total. This is a good compromise between too many ranges (overhead) and too few ranges (unbalanced workload). If you want you can finetune the number of the partitions, by configuring the third optional rangeSize argument.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104