I have two list of strings:
private List<string> sortedList; //about 150k items
private List<string> mixedList; // about 200k items
One of them is sorted, the other one is not.
I use the following code to find the intersecting words:
List<string> intersectingWords=new List<string>();
StringInvariantComparer stringComparer = new StringInvariantComparer();
foreach (var item in mixedList)
{
int res = sortedList.BinarySearch(item, stringComparer);
if (res >= 0 && !intersectingWords.Contains(item))
intersectingWords.Add(item);
}
public class StringInvariantComparer : IComparer<string>
{
public int Compare(string x, string y)
{
return string.Compare(x, y, StringComparison.InvariantCultureIgnoreCase);
}
}
Unfortunately, it takes about 20-25 seconds which is way too much time.
If I change the StringComparison
type to Ordinal
, it finishes in 1-2 seconds but can't find 30% of the words.
What would you recommend to optimize the code?
UPDATE:
If I use the following code:
var wrongRes = sortedList.Intersect(mixedList).ToList();
The result:
If I use the following code:
var upperMixed = new List<string>();
var upperSorted = new List<string>();
foreach (var item in mixedList)
upperMixed.Add(item.ToUpper());
foreach (var item in sortedList)
upperSorted.Add(item.ToUpper());
foreach (var word in upperSorted.Intersect(upperMixed))
{
intersectingWords.Add(word.ToLower());
}
The result:
They are clearly not the same :)