4

I have a string that is split into an integer array. The size of this array can be 1 item up to 100,000 (or larger). Order is not important.

The objective is to use the object later to determine if the value is present. A separate loop will be testing against this object to see if the item exists (the loop will iterate more than the number of items in testmeInt array).

Way 1: Array

Attempt to select the integer, catch an error

Way 2: Dictionary

var testme = "12,23".Split(',');
int[] testmeInt = Array.ConvertAll<string, int>(testme, int.Parse);
Dictionary<int, int> TestMeDict = new Dictionary<int, int>();

foreach (int item in testmeInt)
{
    TestMeDict.Add(item, 0);
}


for (int i = 1; i <= 50000000; i++)
{
    if (TestMeDict.ContainsKey(i) == true) {  
        //It Exists
    }
}

My guess is that the Way2 using Dictionary will be the fastest. This question is similar to mine, but don't cover my exact use case.

Community
  • 1
  • 1
Ariesto
  • 121
  • 4
  • 11
  • 14
    Don't guess, test. Also, can you use `HashSet`? – Jon Skeet Apr 04 '13 at 21:03
  • 3
    [Which is faster](http://ericlippert.com/2012/12/17/performance-rant/) – Austin Salonen Apr 04 '13 at 21:07
  • Agree with @JonSkeet - you need to run your own performance tests here. Also, don't just look at the BCL collections - take a look at http://billmill.org/bloomfilter-tutorial/ for example... – JerKimball Apr 04 '13 at 21:16
  • @JonSkeet, agree HashSet is the fastest way to go. It just can't get any quicker. – Justin Apr 04 '13 at 21:52
  • For 1 item an array will be faster. For 100,000, a hash table will be. Somewhere you will have to decide the possibility of size. In a general sense go for latter. – nawfal May 26 '14 at 06:41

2 Answers2

1

Assuming the values are unique then quite often the Dictionary will be the fastest way of checking for an item being present in a set because of hash lookups.

Direct equality checks aren't needed on the whole stored item (the value) simply the key and this is dealt with as a hash by the dictionary under the hood.


As mentioned by Jon Skeet HashSet is a better way of going if you're just storing ints and don't need direct key-based lookup.

Clint
  • 6,133
  • 2
  • 27
  • 48
  • [`Hashtable`](http://msdn.microsoft.com/en-us/library/system.collections.hashtable.aspx) != [`HashSet`](http://msdn.microsoft.com/en-us/library/bb359438.aspx). The hashtable is the weakly typed parent of the `Dictionary`. The hashset is like a dictionary with unique values only. – Tim Schmelter Apr 04 '13 at 21:05
1

Thanks everyone for the tips. Researching HashSet I came across the below post that directly answers my question.

With my requirements technically HashSet is the answer. However, the performance tests in the blog below illustrated that my Way 2 Dictionary.Key method has the same performance as Hashset (0 ms).

http://theburningmonk.com/2011/03/hashset-vs-list-vs-dictionary/

HashSet<T> versus Dictionary<K, V> w.r.t searching time to find if an item exists

Community
  • 1
  • 1
Ariesto
  • 121
  • 4
  • 11