1

What is the best performance optimized alternative to choose among Dictionary<TKey,TValue>, HashSet<T> and List<T> with respect to :

Add values(without duplicates)

LookUps

Delete values.

I have to avoid adding duplicate values to the collection I know HashSet is good, since it skips the add if a duplicate is detected, a dictionary on the other hand throws an exception if a duplicate is found. List will need an additional ifExists check on the existing items before adding the value. But adding values in a HashSet<T> without duplicates seems to take around 1 min for 10K records. Is there a way to optimize this.

Community
  • 1
  • 1
Maverick
  • 1,396
  • 5
  • 22
  • 42
  • What is your collection going to be storing? Which operations will you be doing on the data? – Yuval Itzchakov Jul 26 '15 at 08:17
  • What are your performance numbers for all of those data structures? How did you do your benchmark? – Patryk Ćwiek Jul 26 '15 at 08:20
  • @YuvalItzchakov collection will be storing a class type which'll have three string and 2 int members, and operations will include set/reset the int column values based on comparison of corresponding string values. – Maverick Jul 26 '15 at 08:22
  • 1
    "a dictionary on the other hand throws an exception if a duplicate is found" - not if you use the indexer. Do you have a value to go with each key? I can see the choice between list/set being reasonable, but usually between set/dictionary is obvious based on whether or not you're trying to associate a value with each key. – Jon Skeet Jul 26 '15 at 08:22
  • 1
    So you'll also need to extract the values from the store. `HashSet` won't help you there. Use a `Dictionary` with a `ContainsKey` check if needed. – Yuval Itzchakov Jul 26 '15 at 08:23
  • 1
    Please add more context into your question, including performance criteria. ("seems to take a lot of time" isn't a precise indication of what you've observed or your requirements). – Jon Skeet Jul 26 '15 at 08:24
  • Possible duplicate of [HashSet versus Dictionary w.r.t searching time to find if an item exists](http://stackoverflow.com/questions/2728500/hashsett-versus-dictionaryk-v-w-r-t-searching-time-to-find-if-an-item-exist) – Oofpez Feb 21 '17 at 07:17

3 Answers3

4

Ok ... In terms of theory, all data structures that you talked about (HashSet, Dictionary and List) have asimptotically O (1 ) time complexity for adding items. Hashing data structures have O(1) for delete also. For Lists, depends a lot where are you perfoming the delete operation: if you remove at a random "i" position then you have O(N) complexity due to the fact that all items from i+1 to the end of the list must be shifted to the left by one position. If you remove always the last element then it is an O(1) complexity.

But most important, data strucures that are based on hashing have a big bonus : O (1) lookup complexity. But this is only in theory. In practice if you define a very bad hashcode for your types you could fallback to an O(N) complexity. A simple example would be overriding gethashcode function and returning a constant int. I suspect that your bad performance comes from a bad GetHashCode design.

Another thing to remember: dictionary and HashSet are data structures to be used in different scenarious. You could view Dictionary as a kind of array for wich the index can be any type and HashSet a special list that does not allow duplicates

George Lica
  • 1,798
  • 1
  • 12
  • 23
3

This perfectly answers the performance stats for Dictionary, List and HashSet w.r.t: Add, LookUp and Remove

http://theburningmonk.com/2011/03/hashset-vs-list-vs-dictionary/

Lester
  • 56
  • 8
1

When it comes to performance and storing unique values I would prefer a Hash set or dictionary depending on my requirement. hashSet is used when you dont have a key value pair to enter and still you dont want a duplicate in your collection. So, hashset is a collection to store unique values wihtout a key value pair. where as when I have a pair of key and value i prefer dictionary to store unique values.

DeshDeep Singh
  • 1,817
  • 2
  • 23
  • 43