If you can get a good key for each line, I suggest using a HashSet<T>
rather than All()
to check each line. A simple/naive example might look like this:
var lineKeys = new HashSet<int>();
foreach (var line in File.ReadLines(ofd.FileName))
{
int hash = line.ToUpper().GetHashCode();
if (linesKeys.Add(hash) || analysisDatas.All(analysisData =>!string.Equals(analysisData.Text, line, StringComparison.CurrentCultureIgnoreCase)))
{
var item = new AnalysisData { Text = line };
analysisDatas.Add(item);
}
}
Note I said, "If". Comparing via hashcode and the ToUpper()
method is not exactly the same as StringComparison.CurrentCultureIgnoreCase
. Some cultures have characters that need special handling based on accents or similar. This might be a problem in your situation, but it might not... you'll have to look at your data and evaluate your needs. Don't short yourself on that evaluation.
Also note my use of int
for the HashSet. I could just put the string there. However, then we end up storing two sets of data in memory for each line: the original line string in the analysisDates
colletion, and the upper case string in the HashSet
. Even if comparisons in the HashSet are only done via the HashCode values, the full version of the string would be stored, too. This allows the GC to collect the uppercase versions of the string. Since there have already been OutOfMemoryException issues, I opted to take a hit on potential wrong-matches in order to save memory.