Parallel programming is not that simple. Using Parallel.For() or Parallel.ForEach() doesn't automatic make your program parallel.
Parallel programming is not about calling any higher level function (in any programming language) to make your code parallel. Is about prepare your code to be parallel.
Actually, you are not paralleling anything at all neither func1 or func2.
Backing to the foundation, the two basic types of parallelism are:
By task, which you split a complex task in smaller subtasks, each subtask to be processed at same time for different cores, CPUs or nodes (in a computer cluster)
By data, which you split a large data set into several smaller slices, each slice to be processed at same time for different cores, CPUs or nodes
Data parallelism is way more trickier to achieve and and not always provide a real performance gain.
Func1 is not really parallel, it's just a heavy piece of computation running concurrently. (Your CPU are just disputing who will finish the 100M for loop first)
Using Parallel.For() you are just spawning this heavy function 100 times among your threads.
A single for loop with Task.Run() inside would have nearly the same result
If your run this in only one thread/core obviously will take sometime. If you run in all your cores will be faster. No big mistery here, although being a concurrent code, not actually parallel. Besides, invoking these tasks 100 times, if you don't have these amount of CPU cores (or nodes in cluster) there's no big difference, parallel/concurrent code will be limit by the actual CPU cores in the machine (will see in a future example)
Now about the Func2 and the interaction with memory heap. Yes, every modern language with a built-in GC it's CPU expensive. One of the most expensive operation in an complex algorithm it's Garbage Collection, sometimes ad in non-optimized codes it can represents over 90% of CPU time.
Let's analyze your function2
- Declare a new Dictionary into the function scope
- Populate this Dictionary with 100M items
- Outer the scope, you called function2 inside a Parallel.For with 100 interations
- 100 different scopes populate 100 different Dictionary with 100M data
- There's no interaction between any of these scopes
As said before, this is not parallel programming, this is concurrent programming. You have separete 100 data chunks of 100M entries in each scope that doesn't intereact each other
But also there's a second factor too. Your function2 operation is a write operation (it means your adding-updading-deleting something to a collection). Well if it's just a bunch of random data and you can admit some loss and inconsistency okay. But if your're handling real data and cannot allow any kind of loss or inconsistency, bad news. There's no true parallel for writing a same memory address (object reference). You will need a synchronization contex and this will make things way slower, and these syncronized operations will always be concurrent, because if a thread is writing on memory reference, the other thread must wait until the other thread leaves. Actually, using several threads to write data might make your code slower instead faster, specially if the parallel operations are not CPU-bound.
For having real gains with data parallelism, you must have been using heavy computations uppon these partitioned data.
Let's check come code below, based on your methodology but with some changes:
var rand = new Random();
var operationSamples = 256;
var datasetSize = 100_000_000;
var computationDelay = 50;
var cpuCores = Environment.ProcessorCount;
Dictionary<int, int> datasetWithLoss = new(datasetSize);
Dictionary<int, int> dataset = new(datasetSize);
double result = 0;
Stopwatch sw = new();
ThreadPool.SetMinThreads(1, 1);
int HeavyComputation(int delay)
{
int iterations = 0;
var end = DateTime.Now + TimeSpan.FromMilliseconds(delay);
while (DateTime.Now < end)
iterations++;
return iterations;
}
double SequentialMeanHeavyComputation(int maxMilliseconds, int samples = 64)
{
double sum = 0;
for (int i = 0; i < samples; i++)
sum += HeavyComputation(maxMilliseconds);
return sum / samples;
}
double ParallelMeanHeavyComputation(int maxSecondsCount, int samples = 64, int threads = 4)
{
ThreadPool.SetMaxThreads(threads, threads);
ThreadPool.GetAvailableThreads(out int workerThreads, out _);
Console.WriteLine($"Available Threads: {workerThreads}");
var _lockKey = new object();
double sum = 0;
int offset = samples / threads;
List<Action> tasks = new();
for (int i = 0; i < samples; i++)
tasks.Add(new Action(() =>
{
var result = HeavyComputation(maxSecondsCount);
lock (_lockKey)
sum += result;
}));
Parallel.Invoke(new ParallelOptions { MaxDegreeOfParallelism = threads }, tasks.ToArray());
return sum / samples;
}
void SequentialDatasetPopulation(int size)
{
for (int i = 0; i < datasetSize; i++)
dataset.TryAdd(i, Guid.NewGuid().GetHashCode());
}
void ParalellDatasetPopulation(int size, int threads)
{
var _lock = new object();
ThreadPool.SetMaxThreads(threads, threads);
ThreadPool.GetAvailableThreads(out int workerThreads, out _);
Console.WriteLine($"Available Threads: {workerThreads}");
Parallel.For(0, datasetSize, new ParallelOptions { MaxDegreeOfParallelism = threads }, (i) =>
{
var value = Guid.NewGuid().GetHashCode();
lock (_lock)
dataset.Add(i, value);
});
}
double SequentialReadOnlyDataset()
{
foreach (var x in dataset)
{
HeavyComputation((int)Math.Tan(Math.Cbrt(Math.Log(Math.Log(x.Value)))) / 10);
}
return 0;
}
double ParallelReadOnlyDataset()
{
Parallel.ForEach(dataset, x =>
{
HeavyComputation((int)Math.Tan(Math.Cbrt(Math.Log(Math.Log(x.Value)))) / 10);
});
return 0;
}
void ParalellDatasetWithLoss(int size, int threads)
{
ThreadPool.SetMaxThreads(threads, threads);
ThreadPool.GetAvailableThreads(out int workerThreads, out _);
Console.WriteLine($"Available Threads: {workerThreads}");
Parallel.For(0, datasetSize, new ParallelOptions { MaxDegreeOfParallelism = threads }, (i) =>
{
int value = Guid.NewGuid().GetHashCode();
datasetWithLoss.Add(i, value);
});
}
sw.Restart();
result = SequentialMeanHeavyComputation(computationDelay, operationSamples);
sw.Stop();
Console.WriteLine($"{nameof(SequentialMeanHeavyComputation)} sequential tasks: {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
sw.Restart();
result = ParallelMeanHeavyComputation(computationDelay, operationSamples, threads: cpuCores);
sw.Stop();
Console.WriteLine($"{nameof(ParallelMeanHeavyComputation)} parallel tasks (CPU threads match count): {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
sw.Restart();
result = ParallelMeanHeavyComputation(computationDelay, operationSamples, threads: 100);
sw.Stop();
Console.WriteLine($"{nameof(ParallelMeanHeavyComputation)} parallel tasks (Higher thread count): {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
sw.Restart();
result = ParallelMeanHeavyComputation(computationDelay, operationSamples, threads: 4);
sw.Stop();
Console.WriteLine($"{nameof(ParallelMeanHeavyComputation)} parallel tasks (Lower thread count): {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
sw.Restart();
SequentialDatasetPopulation(datasetSize);
sw.Stop();
Console.WriteLine($"{nameof(SequentialDatasetPopulation)} sequential data population: {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
dataset.Clear();
sw.Restart();
ParalellDatasetPopulation(datasetSize, cpuCores);
sw.Stop();
Console.WriteLine($"{nameof(ParalellDatasetPopulation)} parallel data population: {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
sw.Restart();
ParalellDatasetWithLoss(datasetSize, cpuCores);
sw.Stop();
Console.WriteLine($"{nameof(ParalellDatasetWithLoss)} parallel data with loss: {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
Console.WriteLine($"Lossless dataset count: {dataset.Count}");
Console.WriteLine($"Dataset with loss: {datasetWithLoss.Count}\n");
datasetWithLoss.Clear();
sw.Restart();
SequentialReadOnlyDataset();
sw.Stop();
Console.WriteLine($"{nameof(SequentialReadOnlyDataset)} sequential reading operations: {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
sw.Restart();
ParallelReadOnlyDataset();
sw.Stop();
Console.WriteLine($"{nameof(ParallelReadOnlyDataset)} parallel reading operations: {sw.Elapsed.TotalMilliseconds.ToString("### ##0.0ms\n")}");
Console.Read();
Output:
SequentialMeanHeavyComputation sequential tasks: 12 800,7ms
Available Threads: 15
ParallelMeanHeavyComputation parallel tasks (CPU threads match count): 860,3ms
Available Threads: 99
ParallelMeanHeavyComputation parallel tasks (Higher thread count): 805,0ms
Available Threads: 3
ParallelMeanHeavyComputation parallel tasks (Lower thread count): 3 200,4ms
SequentialDatasetPopulation sequential data population: 9 072,4ms
Available Threads: 15
ParalellDatasetPopulation parallel data population: 23 420,0ms
Available Threads: 15
ParalellDatasetWithLoss parallel data with loss: 6 788,3ms
Lossless dataset count: 100000000
Dataset with loss: 77057456
SequentialReadOnlyDataset sequential reading operations: 20 371,0ms
ParallelReadOnlyDataset parallel reading operations: 3 020,6ms
(Red: 25%, Orange: 56%, Green: 75%, Blue: 100%)
With task parallelism we achieved over 20x performance using 100% of CPU threads. (in this example, not always like that)
In read-only data paralelism with some computation we achieve near 6,5x faster of CPU usage 56% (with fewer computations the difference would be shorter)
But trying to implement a "real parallism" of data for writing our performance is more than twice slower and CPU can't use full potential using only 25% usage due sycronization contexts
Conclusions:
Using Parallel.For does not guarantee that your code will run really in parallel neither faster. It requires a previous code/data preparation and deep analysis, benchmarks and tunings
Check also this Microsoft Documentation talking about villains in Parallel Code
https://learn.microsoft.com/pt-br/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism