I have a need to do very quick prefix "sql like"
searches over a hundreds of thousands of keys. I have tried doing performance tests using a SortedList, a Dictionary, and a SortedDictionary, which I do like so :
var dictionary = new Dictionary<string, object>();
// add a million random strings
var results = dictionary.Where(x=>x.Key.StartsWith(prefix));
I find that that they all take a long time, Dictionary
is the fastest, and SortedDictionary
the slowest.
Then I tried a Trie implementation from http://www.codeproject.com/Articles/640998/NET-Data-Structures-for-Prefix-String-Search-and-S which is a magnitude faster, ie. milliseconds instead of seconds.
So my question is, is there no .NET collection I can use for the said requirement? I would have assumed that this would be a common requirement.
My basic test :
class Program
{
static readonly Dictionary<string, object> dictionary = new Dictionary<string, object>();
static Trie<object> trie = new Trie<object>();
static void Main(string[] args)
{
var random = new Random();
for (var i = 0; i < 100000; i++)
{
var randomstring = RandomString(random, 7);
dictionary.Add(randomstring, null);
trie.Add(randomstring, null);
}
var lookups = new string[10000];
for (var i = 0; i < lookups.Length; i++)
{
lookups[i] = RandomString(random, 3);
}
// compare searching
var sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
var exists = dictionary.Any(k => k.Key.StartsWith(lookup));
}
sw.Stop();
Console.WriteLine("dictionary.Any(k => k.Key.StartsWith(randomstring)) took : {0} ms", sw.ElapsedMilliseconds);
// test other collections
sw.Restart();
foreach (var lookup in lookups)
{
var exists = trie.Retrieve(lookup).Any();
}
sw.Stop();
Console.WriteLine("trie.Retrieve(lookup) took : {0} ms", sw.ElapsedMilliseconds);
Console.ReadKey();
}
public static string RandomString(Random random,int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
}
Results:
dictionary.Any(k => k.Key.StartsWith(randomstring)) took : 80990 ms
trie.Retrieve(lookup) took : 115 ms