Given:
- Hundreds of .NET projects
- Thousands of C# files across all the projects
- A string literal
I want to output all the matches of the given literal in all the files across all the projects. I would like to use this example in order to understand how to optimize performance of a simple TPL DataFlow pipeline.
The complete code is committed in github - https://github.com/MarkKharitonov/LearningTPLDataFlow/blob/master/FindStringCmd.cs
The pipeline itself is:
private void Run(string workspaceRoot, string literal, int maxDOP1 = 1, int maxDOP2 = 1)
{
var projects = (workspaceRoot + "build\\projects.yml").YieldAllProjects();
var produceCSFiles = new TransformManyBlock<ProjectEx, CSFile>(YieldCSFiles, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = maxDOP1 });
var produceMatchingLines = new TransformManyBlock<CSFile, MatchingLine>(csFile => csFile.YieldMatchingLines(literal), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = maxDOP2 });
var getMatchingLines = new ActionBlock<MatchingLine>(o => Console.WriteLine(o.ToString(workspaceRoot)));
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
produceCSFiles.LinkTo(produceMatchingLines, linkOptions);
produceMatchingLines.LinkTo(getMatchingLines, linkOptions);
Console.WriteLine($"Locating all the instances of {literal} in the C# code ... ");
var sw = Stopwatch.StartNew();
projects.ForEach(p => produceCSFiles.Post(p));
produceCSFiles.Complete();
getMatchingLines.Completion.Wait();
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
Here are some notes:
- It is very cheap to obtain
ProjectEx
objects. - Accessing the property
ProjectEx.MSBuildProject
for the very first time is quite expensive. This is where Microsoft Build API evaluates the respective csproj file. - After the evaluation getting the list of CS files is very cheap, but processing them all is quite expensive, because there are so many of them.
I am not sure how to depict the pipeline graphically here, but:
produceCSFiles
is fed cheapProjectEx
objects and outputs a lot ofCSFile
objects, which is expensive due to project evaluation.produceMatchingLines
is fedCSFile
objects and outputs the matching lines, which is expensive due to sheer quantity ofCSFile
objects and the amount of line to process.
My question - is my implementation optimal? I have doubts, because increasing maxDOP1
and maxDOP2
does not yield too much of an improvement:
C:\work\TPLDataFlow [master ≡ +0 ~2 -0 !]> 1..4 |% { $MaxDOP1 = $_ ; 1..4 } |% { $MaxDOP2 = $_ ; $res = .\bin\Debug\net5.0\TPLDataFlow.exe find-string -d C:\dayforce\tip -l GetClientLegalPromptFlag --maxDOP1 $MaxDOP1 --maxDOP2 $MaxDOP2 -q ; "$MaxDOP1 x $MaxDOP2 --> $res" }
1 x 1 --> Elapsed: 00:00:21.1683002
1 x 2 --> Elapsed: 00:00:19.8194133
1 x 3 --> Elapsed: 00:00:20.2626202
1 x 4 --> Elapsed: 00:00:20.4339065
2 x 1 --> Elapsed: 00:00:17.6475658
2 x 2 --> Elapsed: 00:00:15.4889941
2 x 3 --> Elapsed: 00:00:14.9014116
2 x 4 --> Elapsed: 00:00:14.9254166
3 x 1 --> Elapsed: 00:00:17.6474953
3 x 2 --> Elapsed: 00:00:14.4933295
3 x 3 --> Elapsed: 00:00:14.2419329
3 x 4 --> Elapsed: 00:00:14.1185203
4 x 1 --> Elapsed: 00:00:19.0717189
4 x 2 --> Elapsed: 00:00:15.9069517
4 x 3 --> Elapsed: 00:00:16.3267676
4 x 4 --> Elapsed: 00:00:17.0876474
C:\work\TPLDataFlow [master ≡ +0 ~2 -0 !]>
What I see is:
- Max improvement is when
maxDOP1 == 3
andmaxDOP2 == 4
- 14.12 sec vs 21.17 sec - Max ROI is when
maxDOP1 == 2
andmaxDOP2 == 3
- 15 sec vs 21.17 sec
All in all just 30% improvement over the single threaded version. This is a bit disappointing, because all the files are on the SSD and I have 12 logical processors. And, of course, the code is much more complicated.
Am I missing anything? Maybe I am not doing it in an optimal fashion?