3

In C#, I have some static data that could be put in a Dictionary<int, T> where T is some reference type. The web app only needs to initialize it once, statically (it doesn't change).

Since I don't have to worried about insert or delete performance, what is the best data structure to use (or should I roll my own)? I'm probably looking at something like ~100,000 entries, fairly evenly spaced.

I am looking for an optimal algorithm for fetching this data. Dictionary<> isn't bad, but I would imagine there must be something out there optimized for read-only data.

I suspect, but haven't confirmed that the range of these keys might be 0 - 400,000. If that was the case, how would the recommendations change? (I have a thought that I will post as a possible answer).


Maybe I could:

  1. Scan through the data once and grab the highest key
  2. Allocate an array with the size of the highest key + 1.
  3. Take a second pass and store the data in the array.

Would this be better or worse than a HashTable / Dictionary with a reasonable load factor?

Adam Lear
  • 38,111
  • 12
  • 81
  • 101
Adam Tegen
  • 25,378
  • 33
  • 125
  • 153
  • Seems rather unlikely that the lookup performance of a single dictionary with 100K entries is significantly impacting the overall performance of your application. – Michael Petito Dec 20 '11 at 03:15
  • 1
    The application is effectively a search page, that searches this dictionary. I am interested in the answer both for this app and as a thought-experiment. – Adam Tegen Sep 10 '12 at 16:43

2 Answers2

6

A Dictionary is the right way to go. Here is a quote from MSDN:

The Dictionary(Of TKey, TValue) generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary(Of TKey, TValue) class is implemented as a hash table.

So it will take a lot of time while building dictionary (calculating hashes and building tree), but will be blizzard fast to read your data by key.

Edit

In case you will have more than 50% keys present from range 0-400k it makes sense to go with a simple array, where a key is the item's index. This will give you O(1) complexity in a best-case scenario. According to your question, only 25% of keys would be present. So I would go with Dictionary<,> in this case, I don't think that it has 75% overhead of memory to store each key-value pair comparing to simple array.

Haus
  • 1,492
  • 7
  • 23
Restuta
  • 5,855
  • 33
  • 44
  • Be sure to use the TryGet method to access the key's. This way you only have to make one call to check if a key exists AND get the value. – Ruzzie Dec 20 '11 at 06:37
0

If it's really dictionary, trie works reasonably well. Dictionary (a hashtable) is another possibility, as long as you fine-tune it. Which would be faster... I don't know, you'd need to profile it, I guess. Space-wise, trie wins hands down. I don't think .NET has a trie in its standard library, but there should be some implementations floating around.

Amadan
  • 191,408
  • 23
  • 240
  • 301