Partitioning and multi-threading can noticeably improve the performance. Here is a .NET library I wrote called Gigantor designed to do regular expression searches of large files that uses partitioning and multi-threading (available as source or nuget package). According to my benchmarking you should expect in the neighborhood of an 8x improvement for uncompressed text with a simple regex. I'd like to learn more about your use case if you try this and don't get similar results (or even if you do).
Here is an example for benchmarking single threaded vs multi-threaded performance using Gigantor's RegexSearcher
class.
using System.Text.RegularExpressions;
using Imagibee.Gigantor;
using System.Diagnostics;
public void Benchmark(string path, string word)
{
// Create the dependencies we will need
Regex regex = new(word, RegexOptions.Compiled);
AutoResetEvent progress = new(false);
Stopwatch stopwatch = new();
// Benchmark results with differing number of threads
foreach (var numWorkers in new List<int>() { 1, 128 }) {
Console.WriteLine($"Starting search with {numWorkers} thread(s)");
stopwatch.Start();
// Create the searcher
RegexSearcher searcher = new(
path,
regex,
progress,
maxMatchCount: 1000,
maxWorkers: numWorkers);
// Start and wait for completion
Background.StartAndWait(
new List<IBackground>() { searcher },
progress,
(_) => { Console.Write("."); },
1000);
Console.Write('\n');
// Display results
var runTime = stopwatch.Elapsed.TotalSeconds;
Console.WriteLine($"Completed in {runTime} seconds");
stopwatch.Reset();
}
}
About the Benchmarking
The benchmarking consists of searching for the Regex of the pattern @"food"
over enwik9. Five iterations are ran and the average throughput is used. The benchmarking source code is here. Command and results shown below.
$ dotnet SearchApp/bin/Release/net6.0/SearchApp.dll benchmark ${TMPDIR}/enwik9
........................
maxWorkers=1, chunkKiBytes=512, maxThread=32767
105160 matches found
searched 5000000000 bytes in 24.0289207 seconds
-> 208.0825877460239 MBytes/s
..............
maxWorkers=2, chunkKiBytes=512, maxThread=32767
105160 matches found
searched 5000000000 bytes in 12.692795 seconds
-> 393.92426963485974 MBytes/s
.........
maxWorkers=4, chunkKiBytes=512, maxThread=32767
105160 matches found
searched 5000000000 bytes in 6.8668367 seconds
-> 728.1373095707955 MBytes/s
....
maxWorkers=8, chunkKiBytes=512, maxThread=32767
105160 matches found
searched 5000000000 bytes in 3.7174496 seconds
-> 1345.0081475213544 MBytes/s
....
maxWorkers=16, chunkKiBytes=512, maxThread=32767
105160 matches found
searched 5000000000 bytes in 3.0211296 seconds
-> 1655.0100995336313 MBytes/s
....
maxWorkers=32, chunkKiBytes=512, maxThread=32767
105160 matches found
searched 5000000000 bytes in 3.191699 seconds
-> 1566.5637643148682 MBytes/s
....
maxWorkers=64, chunkKiBytes=512, maxThread=32767
105160 matches found
searched 5000000000 bytes in 3.2240221 seconds
-> 1550.8578554718963 MBytes/s
....
maxWorkers=128, chunkKiBytes=512, maxThread=32767
105160 matches found
searched 5000000000 bytes in 3.3693127 seconds
-> 1483.982178323787 MBytes/s
The value of 8x was computed by dividing the throughput value from maxWorkers=16
with maxWorkers=1
(1655 / 208 = 7.96). And this is what I was referring to.
NOTES:
- The above results are when net6 is targeted. I have recently re-run with net7 and the single threaded performance has improved a lot to 416 MBps. So with net7 the overall performance has improved to 1771 MBps but the relative improvement has decreased to about 4x.
- The pattern
@"food"
is a very simple regex. For more complex regex that require more CPU I would expect a greater relative performance improvement by using this approach.