4

I have a somewhat sizeable Hashtable populated with lookup only static data.

This means that when the program starts, I'll either have a long initializer/constructor method that will execute many hashtable.Add() methods (yuk) or de-serialize from a resource file that I've custom generated at coding time.

Is there an attribute or another way I could use to include this data at compile-time?

MandoMando
  • 5,215
  • 4
  • 28
  • 35
  • Maybe you can clarify if its the amount of code you write, or the speed of the code that you're worried about? – MerickOWA Dec 03 '10 at 18:31
  • Thanks MerickOWA. It's both. Having to write a custom program just to get the serialized data is, well, not elegant. Populating the entries one at the time at run-time, is less coding, but adds to load time. – MandoMando Dec 03 '10 at 18:51
  • I don't think you're going to find small/fast/elegant all in one solution unless its a programming language designed just for that purpose. – MerickOWA Dec 03 '10 at 19:04

5 Answers5

4

If your data is strictly static (or static enough that you can hard-code it in your program), then you could just put all of your values in a switch statement. Can't really say if it is a good idea or not, but it doesn't seem any worse than loading up a HashTable. On the plus side, the "hash table" initialization does become a strictly compile time operation:

public int Lookup(int key)
{
  switch (key)
  {
    case K1: return V1;
    case K2: return V2;
    case K3: return V3;
    case K4: return V4;
    case K5: return V5;
    case K6: return V6;
    case K7: return V7;
    default: return V_WHOOPS;
  }
}

If the number of values is large, you could write a script to generate the code rather than type it in by hand.

wageoghe
  • 27,390
  • 13
  • 88
  • 116
  • 1
    wageoghe, executing ~500 conditionals with every lookup is not good form. For under 20 items, maybe. – MandoMando Dec 03 '10 at 18:59
  • I don't think that that many conditionals will actually happen. See this post for a discussion of the performance of switch vs if. http://stackoverflow.com/questions/445067/if-vs-switch-speed – wageoghe Dec 03 '10 at 19:02
  • http://stackoverflow.com/questions/395618/if-else-vs-switch Here is another post that says (in ima's answer) that in release mode a switch statement lookup can be O(1). I can't say that the switch statement will be faster than a HashTable, but it should be easy enough to test. If it is faster, use it. If not, you are back to your original problem of figuring out how to populate your HashTable at runtime. – wageoghe Dec 03 '10 at 19:10
  • 5
    A critical aspect for whether an integer based switch statement compiles to a jump table is the density of the case values. If the values are sequential or nearly squential, you have a much greater chance that the compiler will opt for a jump table. If the values are all over the place with large gaps between values, the jump table will be too expensive in terms of memory use / code size. If you can get it to produce a jump table, the performance will be in constant time regardless of how many cases there are. – dthorpe Dec 03 '10 at 19:14
  • The keys are of string type so a jump table is out of the question. Otherwise, I'd have just built an array. I remember Bill Joy talking about switch vs if-else in the early years. – MandoMando Dec 03 '10 at 19:43
  • 2
    You are probably right since your keys are strings. FWIW, I did a rough test where I created a switch-based lookup function with 500 string cases and a HashTable filled with 500 strings and then for each lookup method did 10,000,000 lookups (only of valid keys that ). The switch statement lookup took an average of 3200 ms for 10,000,000 lookups while the HashTable lookup took an average of 2400 ms. So, while the switch based lookup took an average of 50% longer, over 10,000,000 lookups the total difference was only 1200 ms. – wageoghe Dec 03 '10 at 20:30
  • So, the HashTable approach is probably better, but the switch method actually did perform better than I would have guessed. – wageoghe Dec 03 '10 at 20:31
  • @wageoghe: wow, that says a lot about the hashtable implementation, ha? O(1) is barely able to beat O(n) at 500 entries. Interesting experiment. – MandoMando Dec 06 '10 at 15:27
  • It was a pretty quick and dirty test. I started doing it based on integers and changed it when you said your keys were strings. So, the strings that I used were just the same integers I was using converted to strings. I generated the integers by starting at 0 and adding a random number between 1 and 75 and then saved the integers as strings. The lengths varied from 1 character to 5 characters. I saved the key values in a list and randomly looked up each item by picking a key from index 1-500. The intent was to not bias the test by having a lot of misses. – wageoghe Dec 06 '10 at 15:50
  • Anyway, I found it pretty interesting as well. I wasn't sure what to expect with the test after you said that you keys were strings. – wageoghe Dec 06 '10 at 15:52
2

Depending on what you have in the table. You can always use a resource file.

http://msdn.microsoft.com/en-us/library/ekyft91f.aspx

kemiller2002
  • 113,795
  • 27
  • 197
  • 251
  • I already talked about a resource file in the question. Do you know of a way to automatically create resource data? – MandoMando Dec 03 '10 at 18:52
  • Well it's an xml file, so you could set up an option as part of the pre-build process to create the file, and then compile it. That's about the best way I can think about doing it. – kemiller2002 Dec 03 '10 at 18:58
1

Theres no way to conceptually setup an object at compile time. The object must be allocated/constructed by .NET runtime then filled out with data somehow.

As far as how do you make this faster, you could try serializing the Hashtable to a binary file after all the slow Add calls have been made on it.

Then in your main application you can just serialize it back when it needs it.

This would hopefully give you faster initialization of your HashTable as opposed to many .Add calls.

MerickOWA
  • 7,453
  • 1
  • 35
  • 56
0

A way you could handle this, if you don't want to load a Hashtable at runtime is to store the data in a database and then perform a lookup either with queries or LINQ. There are several options out there, SQLLite , SqlServerCE if you just want something desktop oriented. You could also use something more robust like SqlServer or MySql depending on the scope of your application.

pstrjds
  • 16,840
  • 6
  • 52
  • 61
  • That would still be populated at runtime. – Brian Rasmussen Dec 03 '10 at 18:19
  • Sure it would, but it would not require a big list of _values.Add calls in the constructor. Just one way of handling it, if the list is something that would change frequently, I sure wouldn't hard code it. But if its a permanent list, this would work. I don't know that it would be any less performative than loading the resources from a file, even if you built the collection and serialized it and then loaded it from the serialized version, you still have to fill the collection. – pstrjds Dec 03 '10 at 18:21
  • A `HashSet` is not the same concept as a `HashTable` though. – Nick Larsen Dec 03 '10 at 18:28
  • @Nick you are correct, my bad on the HashSet, will remove that answer since I don't see a way to initialize a dictionary without multiple add calls. – pstrjds Dec 03 '10 at 18:32
  • `{ x, y, z }` is same as `Add(x) etc` it was asked about alternatives. – Andrey Dec 03 '10 at 18:32
  • @Andrey I removed the HashSet since I was mistakenly in my head equating HashSet to HashTable. I believe that the AddRange calls to collections are in general more performative than a bunch of individual calls to Add (I have not tested this, so this is assumption on my part). I was looking at the question from the performance case. As far as I know a single call is more performative than multiple calls, especially if its a large number of calls. – pstrjds Dec 03 '10 at 18:35
0

If you want an instance of HashTable at run-time, you must allocate and populate that instance of HashTable at run-time.

If you want to make a decision with a value, you could write a method.

Amy B
  • 108,202
  • 21
  • 135
  • 185