Problem (Check edit for clarifications)
I have a list of about 1500 strings and for each of these strings I have to check if any of the files in a directory (and subdirectories) contains that string (there are about 4000 files).
Code
What I have now are these two variants:
Original
foreach(var str in stringList)
{
allFiles.Any(f => File.ReadAllText(f).Contains(str));
}
Second variant (using ReadLines instead of ReadAllText, as suggested from VladL in this question)
foreach(var string in stringList)
{
allFiles.SelectMany(File.ReadLines).Any(line => line.Contains(str));
}
I only tested the complete program execution of the original variant and it took 21 minutes to finish. I then tested a single statement (check if 1 string is contained in any file) searching for a string that I knew it wasn't contained to check the worst case scenario, and this are my timings (executed each 3 times):
Original: 1285, 1369, 1336 ms
Second variant: 2718, 2804, 2831 ms
I also tryed to replace ReadAllText with ReadAllLines in the Original statement (without changing anything else), but with no performance changes.
Question
Is there any faster way for checking if a string is contained in any file (large amount of large files)?
Edit
I admit I didn't expressed myself as clear as I wanted, by saying I have a list of strings. What I actually have is a list of csv files, I then itarate trhough those and then iterate through each line of these file (ignoring the first line). With each line I create a string composing it with some of the fields of the line, and then look if any file contains that string.
foreach(var csvFile in csvFiles)
{
var lines = File.ReadAllLines(csvFile);
foreach(var line in lines)
{
if (IsHeader(line)) continue;
var str = ComposeString(line);
var bool = allFiles.Any(f => File.ReadAllText(f).Contains(str));
// do stuff with the line and bool
}
}
Edit 2
public void ExecuteAhoCorasick()
{
var table = CreateDataTable();
var allFiles = GetAllFiles();
var csvFiles = GetCsvFiles();
var resList = new List<string>();
foreach(var csvFile in csvFiles)
{
if (file.Contains("ValueList_")) continue;
var lines = File.ReadAllLines(file);
foreach (var line in lines)
{
if (line == HeaderLine) continue;
var res = line.Split(';');
if (res.Length <= 7) continue;
var resPath = $"{res[0]}.{res[1]}.{res[2]}".Trim('.');
resList.Add(resPath);
var row = table.NewRow();
row[0] = res[0]; // Group
row[1] = res[1]; // Type
row[2] = res[2]; // Key
row[3] = res[3]; // Global
row[4] = res[4]; // De
row[5] = res[5]; // Fr
row[6] = res[6]; // It
row[7] = res[7]; // En
row[8] = resPath; // Resource Path
row[9] = false;
row[10] = ""; // Comment
row[11] = file; // File Path
table.Rows.Add(row);
}
}
var foundRes = new List<string>();
foreach (var file in allFiles)
{
// var chars = File.ReadLines(file).SelectMany(line => line);
var text = File.ReadAllText(file);
var trie = new Trie();
trie.Add(resList);
foundRes.AddRange(trie.Find(text));
// foundRes.AddRange(trie.Find(chars));
}
// update row[9] to true foreach res in foundRes
}