You will most likely see an improvement if you use a HashSet<KeyValuePair<string, string>>
.
The test below finishes on my machine in about 10 seconds. If I change...
var collection = new HashSet<KeyValuePair<string, string>>();
...to...
var collection = new List<KeyValuePair<string, string>>();
...I get tired of waiting for it to complete (more than a few minutes).
Using a KeyValuePair<string, string>
has the advantage that equality is determined by the values of Key
and Value
. Since strings are interned, and KeyValuePair<TKey, TValue>
is a struct, pairs with the same Key
and Value
will be considered equal by the runtime.
You can see that equality with this test:
var hs = new HashSet<KeyValuePair<string, string>>();
hs.Add(new KeyValuePair<string, string>("key", "value"));
var b = hs.Contains(new KeyValuePair<string, string>("key", "value"));
Console.WriteLine(b);
One thing that's important to remember, though, is that the equality of pairs depends on the internment of strings. If, for some reason, your strings aren't interned (because they come from a file or something), the equality probably won't work.
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace ConsoleApplication1 {
internal class Program {
static void Main(string[] args) {
var key = default(string);
var value = default(string);
var collection = new HashSet<KeyValuePair<string, string>>();
for (var i = 0; i < 5000000; i++) {
if (key == null || i % 2 == 0) {
key = "k" + i;
}
value = "v" + i;
collection.Add(new KeyValuePair<string, string>(key, value));
}
var found = 0;
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 5000000; i++) {
if (collection.Contains(new KeyValuePair<string, string>("k" + i, "v" + i))) {
found++;
}
}
sw.Stop();
Console.WriteLine("Found " + found);
Console.WriteLine(sw.Elapsed);
Console.ReadLine();
}
}
}