Gigantor provides a RegexSearcher
that can do this. I tested your example with a 32 GB file I have laying around. It took less than 20 seconds on my MacBook Pro. Code shown below.
Gigantor boosts the performance of regular expressions and works with gigantic files. Here's what the code for your Search
function would look like using Gigantor.
public List<string> Search(string path, string searchKey)
{
// Create regex to search for the searchKey
System.Text.RegularExpressions.Regex regex = new(searchKey);
List<string> results = new List<string>();
// Create Gigantor stuff
System.Threading.AutoResetEvent progress = new(false);
Imagibee.Gigantor.RegexSearcher searcher = new(
path, regex, progress, maxMatchCount: 10000);
// Start the search and wait for completion
Imagibee.Gigantor.Background.StartAndWait(
searcher,
progress,
(_) => { },
1000);
// Check for errors
if (searcher.Error.Length != 0) {
throw new Exception(searcher.Error);
}
// Open the searched file for reading
using System.IO.FileStream fileStream = new(path, FileMode.Open);
Imagibee.Gigantor.StreamReader reader = new(fileStream);
// Capture the line of each match
foreach (var match in searcher.GetMatchData()) {
fileStream.Seek(match.StartFpos, SeekOrigin.Begin);
results.Add(reader.ReadLine());
}
return results;
}
Here's the test code.
[Test]
public void SearchTest()
{
var path = Path.Combine(Path.GetTempPath(), "enwik9x32");
Stopwatch stopwatch = new();
stopwatch.Start();
var results = Search(path, "unicorn");
stopwatch.Stop();
Console.WriteLine($"found {results.Count} results in {stopwatch.Elapsed.TotalSeconds} seconds");
}
Here's the console output
found 8160 results in 19.1458573 seconds
And here's the Gigantor source repo. I know its a little late but hopefully this answer is helpful to someone.