0

I have a large number of objects. Each object has a unique GUID. I need map objects by this GUID. I use now System.Collections.Hashtable. The problem is adding objects hashset changes its size and causes fragmentation of Large object heap. Also it needs twice as much memory as I have objects. I need to reduce the memory usage.

Features of the data structure I need:

  • Add object
  • Remove object by ID
  • Find object by ID
  • Run though all the objects in data structure (foreach)

What is the best datastructure for this purpose? I know there is red-black and AVL trees, but I don't know what tree is better to use. Maybe there is another tree data structure suitable for mapping by unique identifiers or strings? which data structure would work faster?

PSsam
  • 639
  • 7
  • 18
  • What do your objects contain? – alexn Jun 20 '12 at 07:23
  • try to look at btree, but I'm not sure you gain so much in term of memory. http://en.wikipedia.org/wiki/B-tree – Felice Pollano Jun 20 '12 at 07:24
  • I cannot answer your actual question, but I would use Dictionary<> instead of Hashtable. Dictionary is generic and you probably save a lot of type casts which saves you CPU time. – Oliver Kötter Jun 20 '12 at 07:26
  • CPU is not so critical as memory, so it is no difference for me to use Dictionary or Hashtable. – PSsam Jun 20 '12 at 07:31
  • @PSsam how big data we are talking about here ? http://www.phase9studios.com/post/2008/01/08/DictionaryVSHashTable.aspx and http://stackoverflow.com/questions/1089132/net-hashtable-vs-dictionary-can-the-dictionary-be-as-fast – adt Jun 20 '12 at 07:38
  • Dictionary is more memory efficient, too - as it does no bos/unbox. Hashtable msut keep an object around for every GUID. You can preallocate the size peroply - inert 750.000 as parameter on creating and you should be good with 100k to 500k objects. – TomTom Jun 20 '12 at 07:43
  • agree with @TomTom. But I am curious if something like Redis would be any help in this case ? – adt Jun 20 '12 at 07:48
  • Dictionary contains 2 arrays int[] buckets and Dictionary.Entry[] entries; both will grow and cause fragmentation. Hashtable has only one array Hashtable.bucket[] buckets. I had no opportunity to compare these data structures on my project because of lack of time, but I almost certain that Dictionary will cause the same problems. – PSsam Jun 20 '12 at 07:53
  • @adt, There is no official support for Windows builds of Redis. – PSsam Jun 20 '12 at 08:01

2 Answers2

0

Is the data structure static or dynamic? If it's static, consider using perfect hashing. You'll get the benefits of the hashtable without as much memory overhead.

Don't count on trees to solve your problem... they also have reasonably high memory overhead, and tend to be slower for queries and updates.

Yaniv
  • 991
  • 6
  • 13
0

500,000 entries in a hashtable is really not a lot. Just tell the hashtable it's going to be large when you create it:

var myDict = new Dictionary<key,val>(1000000);

This will create a dictionary with room for close to 1,000,000 elements. It will be resized when you get close to 1,000,000. The old non-generic Hashtable gives you even more control, allowing you specify a load factor, to control reallocations. Look here.

zmbq
  • 38,013
  • 14
  • 101
  • 171
  • 500,000 is a normal state, but there are some cases in my project, when Hashtable takes 40mb. So I don't ask how to use Dictionary and its benefits. I know this data structure. Yes, maybe it's better than Hashtable. But what I want is to try another approaches. I don't believe that dictionary is the only solution. – PSsam Jun 20 '12 at 11:47
  • A Dictionary is a Hashtable. Are you running this on some mobile device, so that 40MB is an issue for you? – zmbq Jun 20 '12 at 12:22