2
var usedIds = list.Count > 20 ? new HashSet<int>() as ICollection<int> : new List<int>();

Assuming that List is more performant with 20 or less items and HashSet is more performant with greater items amount (from this post), is it efficient approach to use different collection types dynamicaly based on the predictable items count?

All of the actions for each of the collection types will be the same.

PS: Also i have found HybridCollection Class which seems to do the same thing automaticaly, but i've never used it so i have no info on its performance either.

EDIT: My collection is mostly used as the buffer with many inserts and gets.

Community
  • 1
  • 1
Alexander Smirnov
  • 1,573
  • 12
  • 23
  • 2
    Are you sure that the lists are where you are going to experience your slow downs? Writing clean maintainable code might be more advantageous than dynamically generating types based on length. – crthompson Nov 07 '13 at 20:34
  • Yes, i'm looking into the complex calculations logic and trying to optimize as much as i can :) CLean code is not the problem, i'm trying to achieve some balance also :) – Alexander Smirnov Nov 07 '13 at 20:39
  • 5
    "How you are using this collection" is a better question than "how many items are in this collection" in most cases. – Tim S. Nov 07 '13 at 20:42
  • Making choices adaptively is often a good idea, yes. Whether your particular adaptive choice is a good one is not clear without a lot more context about your application. Measure it and find out. – tmyklebu Nov 07 '13 at 20:53
  • premature optimization.... well microoptimization is not helpfull at all for 99.9 % of the applications. – quadroid Nov 07 '13 at 21:07

6 Answers6

3

In theory, it could be, depending on how many and what type of operations you are performing on the collections. In practice, it would be a pretty rare case where such micro-optimization would justify the added complexity.

Also consider what type of data you are working with. If you are using int as the collection item as the first line of your question suggests, then the threshold is going to be quite a bit less than 20 where List is no longer faster than HashSet for many operations.

In any case, if you are going to do that, I would create a new collection class to handle it, something along the lines of the HybridDictionary, and expose it to your user code with some generic interface like IDictionary.

And make sure you profile it to be sure that your use case actually benefits from it.

There may even be a better option than either of those collections, depending on what exactly it is you are doing. i.e. if you are doing a lot of "before or after" inserts and traversals, then LinkedList might work better for you.

Gerald
  • 23,011
  • 10
  • 73
  • 102
1

HashSet is for faster access, but List is for insert. If you don't plan adding new items, use HashSet, otherwise List.

alexmac
  • 19,087
  • 7
  • 58
  • 69
  • 1
    Inserting items to a list, anywhere other than the end, is an O(n) operation. Inserting items to a HashSet is not only possible, it's O(1). Whether or not you plan to insert items is not the distinction, but rather whether or not there is an inherent ordering to the items that must be maintained, whether there can/should be dulpicates, whether there exists sensible and efficient hashing/equality functions, etc. – Servy Nov 07 '13 at 20:54
1

If you collection is very small then the performance is virtually always going to be a non-issue. If you know that n is always less than 20, O(n) is, by definition, O(1). Everything is fast for small n.

Use the data structure that most appropriate represents how you are conceptually treating the data, the type of operations that you need to perform, and the type of operations that should be most efficient.

Servy
  • 202,030
  • 26
  • 332
  • 449
1

Hashtables like Hashset<T> and Dictionary<K,T> are faster at searching and inserting items in any order.

Arrays T[] are best used if you always have a fixed size and a lot of indexing operations. Adding items to a array is slower than adding into a list due to the covariance of arrays in c#.

List<T> are best used for dynamic sized collections whith indexing operations.

I don't think it is a good idea to write something like the hybrid collection better use a collection dependent on your requirements. If you have a buffer with a lof of index based operations i would not suggest a Hashtable, as somebody already quoted a Hashtable by design uses more memory

quadroid
  • 8,444
  • 6
  • 49
  • 82
  • 1
    Small correction, hash based collections are faster inserting items in a *no* specific order. – nawfal May 25 '14 at 11:12
  • `If you dont care about the order of your elements use a Collection` - why? `Collection` preserves insertion order. – nawfal May 29 '14 at 15:44
  • @nawfal it does yes, but it does not enforce it for all derived types. but you are right, that does not make sense i edited my answer – quadroid May 29 '14 at 21:20
0

is it efficient approach to use different collection types dynamicaly based on the predictable items count?

It can be depending on what you mean by "efficiency" (MS offers HybridDictionary class for that, though unfortunately it is non generic). But irrespective of that its mostly a bad choice. I will explain both.

From an efficiency standpoint:

  1. Addition will be always faster in a List<T>, since a HashSet<T> will have to precompute hash code and store it. Even though removal and lookup will be faster with a HashSet<T> as size grows up, addition to the end is where List<T> wins. You will have to decide which is more important to you.

  2. HashSet<T> will come up with a memory overhead compared to List<T>. See this for some illustration.

But however, from a usability standpoint it need not make sense. A HashSet<T> is a set, unlike a bag which List<T> is. They are very different, and their uses are very different. For:

  1. HashSet<T> cannot have duplicates.

  2. HashSet<T> will not care about any order.

So when you return a hybrid ICollection<T>, your requirement goes like this: "It doesn't matter whether duplicates can be added or not. Sometimes let it be added, sometimes not. Of course iteration order is not important anyway" - very rarely useful.

Good q, and +1.

nawfal
  • 70,104
  • 56
  • 326
  • 368
-2

HashSet is better, because it will probably use less space, and you will have faster access to elements.

Dejan
  • 3,046
  • 3
  • 28
  • 43
  • 3
    A `HashSet` will, in most all cases, use more memory than a corresponding `List`. This is by design. It trades faster lookup speed for a larger memory footprint. – Servy Nov 07 '13 at 20:57
  • Hash function implemented internally will get you element faster than iterating through whole list. – Dejan Nov 07 '13 at 20:59
  • Yes, it generally will, but the hash set will use more memory to do it. It will use *more* space, but have faster access. Also, this answer doesn't address the question at all in that a hash set isn't always faster for very small data sets. – Servy Nov 07 '13 at 21:00
  • Yes but if number of elements is large HashSet is much, much better. – Dejan Nov 07 '13 at 21:02
  • 2
    That depends on a number of factors. Searching for items is likely to be faster, for example. But the hash set will use quite a bit more memory, often even twice as much, as a straight list. It also depends on the operations performed on the collection. If one is only adding to the end and iterating the whole thing a list is likely to be faster. – Servy Nov 07 '13 at 21:04
  • HasSet is not a faster than a list if i try to add a whitespace for 1 million items. its not about the size! – quadroid Nov 07 '13 at 21:08
  • It is probably the same on your PC because you have cache on it, and sequential access is faster, but generally you should use HashSet. – Dejan Nov 07 '13 at 21:23
  • i don't know why i should use a hashset at all if i don't plan to access a specific item at any time. Do not generally use a HashSet. It's about the purpose of the collection nothing else. But i think we can stop this here you got your oppinion and i got mine. – quadroid Nov 07 '13 at 21:33