13
var dict = new Dictionary<int, string>();
for (int i = 0; i < 200000; i++)
    dict[i] = "test " + i;

I iterated this dictionary using the code below:

foreach (var pair in dict)
    Console.WriteLine(pair.Value);

Then, I iterated it using this:

foreach (var key in dict.Keys)
    Console.WriteLine(dict[key]);

And the second iteration took ~3 seconds less. I can get both keys and values via both methods. What I wonder is whether the second approach has a drawback. Since the most rated question that I can find about this doesn't include this way of iterating a dictionary, I wanted to know why no one uses it and how does it work faster.

Community
  • 1
  • 1
Şafak Gür
  • 7,045
  • 5
  • 59
  • 96
  • 4
    I don't like your use of the console for this. What happens if you switch the two around and test the `dict.Keys` approach before the other one? – Blindy Jul 17 '12 at 19:23
  • @Blindy given the size of the dictionary I should imagine the console would *not* have an impact (ie, any output buffering would even out between the two), but it's worth a look at the generated IL to see what's different – Daniel DiPaolo Jul 17 '12 at 19:24
  • @Lee The accepted answer iterates the pairs, not the keys right? – NominSim Jul 17 '12 at 19:26
  • 1
    In my test, the second method (accessing `dict[key]`) was slower. The time taken to call `Console.WriteLine` (which does I/O) probably dominates this performance test and is affecting your measurement of the enumeration. – Bradley Grainger Jul 17 '12 at 19:26
  • @Bradley so using console for these kind of tests is wrong. Thanks. – Şafak Gür Jul 17 '12 at 19:31
  • 1
    @d4wn, yeah, you should avoid anything you don't have control over, and IO gets buffered by the framework, again by windows, there may or may not be graphics card latency as you're uploading the new data from forced repaints over, etc, which will trick you if you think you're measuring dictionary performance :) – Blindy Jul 17 '12 at 19:34
  • If you just want to write out all of the values why not `foreach` the `Values` property rather than the `Keys` or the pairs? – Servy Jul 17 '12 at 19:34
  • @Servy I don't, I wanted both keys and values, that piece of code was just for tests. – Şafak Gür Jul 17 '12 at 19:36
  • 1
    You should also beware of micro-optimization. Is your code really in a position where a few extra milliseconds here really matter? It's certainly possible, but it's still likely enough that your program is already good enough or that there are better code snippets to optimize. – Servy Jul 17 '12 at 19:36
  • @Servy nah, I was just experimenting and doing even that wrong, it seems. – Şafak Gür Jul 17 '12 at 19:42

1 Answers1

23

Your time tests have some fundamental flaws:

  • Console.Writeline is an I/O operation, which takes orders of magnitude more time than memory accesses and CPU calculations. Any difference in iteration times is probably being dwarfed by the cost of this operation. It's like measuring the weights of pennies in a cast-iron stove.
  • You don't mention how long the overall operation took, so saying that one took 3 seconds less than another is meaningless. If it took 300 seconds to run the first, and 303 seconds to run the second, then you are micro-optimizing.
  • You don't mention how you measured running time. Did running time include the time getting the program assembly loaded and bootstrapped?
  • You don't mention repeatability: Did you run these operations several times? Several hundred times? In different orders?

Here are my tests. Note how I try my best to ensure that the method of iteration is the only thing that changes, and I include a control to see how much of the time is taken up purely because of a for loop and assignment:

void Main()
{
    // Insert code here to set up your test: anything that you don't want to include as
    // part of the timed tests.
    var dict = new Dictionary<int, string>();
    for (int i = 0; i < 2000; i++)
        dict[i] = "test " + i;
    string s = null;
    var actions = new[]
    {
        new TimedAction("control", () => 
        {
    for (int i = 0; i < 2000; i++)
            s = "hi";
        }),
        new TimedAction("first", () => 
        {
            foreach (var pair in dict)
            s = pair.Value;
        }),
        new TimedAction("second", () => 
        {
            foreach (var key in dict.Keys)
            s = dict[key];
        })
    };
    TimeActions(100, // change this number as desired.
        actions);
}


#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
    Stopwatch s = new Stopwatch();
    foreach(var action in actions)
    {
        var milliseconds = s.Time(action.Action, iterations);
        Console.WriteLine("{0}: {1}ms ", action.Message, milliseconds);
    }

}

public class TimedAction
{
    public TimedAction(string message, Action action)
    {
        Message = message;
        Action = action;
    }
    public string Message {get;private set;}
    public Action Action {get;private set;}
}

public static class StopwatchExtensions
{
    public static double Time(this Stopwatch sw, Action action, int iterations)
    {
        sw.Restart(); 
        for (int i = 0; i < iterations; i++)
        {
            action();
        }
        sw.Stop();

        return sw.Elapsed.TotalMilliseconds;
    }
}
#endregion

Result

control: 1.2173ms
first: 9.0233ms
second: 18.1301ms

So in these tests, using the indexer takes roughly twice as long as iterating key-value pairs, which is what I would expect*. This stays roughly proportionate if I increase the number of entries and the number of repetitions by an order of magnitude, and I get the same results if I run the two tests in reverse order.

* Why would I expect this result? The Dictionary class probably represents its entries as KeyValuePairs internally, so all it really has to do when you iterate it directly is walk through its data structure once, handing the caller each entry as it comes to it. If you iterate just the Keys, it still has to find each KeyValuePair, and give you the value of the Key property from it, so that step alone is going to cost roughly the same amount as iterating across it in the first place. Then you have to call the indexer, which has to calculate a hash for provided key, jump to the correct hashtable bucket, and do an equality check on the keys of any KeyValuePairs it finds there. These operations aren't terribly expensive, but once you do them N times, it's roughly as expensive as if you'd iterated over the internal hashtable structure again.

StriplingWarrior
  • 151,543
  • 27
  • 246
  • 315
  • Just curious, why would you expect that? I would expect much less than a 100% increase in time. – NominSim Jul 17 '12 at 19:46
  • 1
    @NominSim In the first you are essentially getting every single item from an array, all at once, in order. In the second, you are accessing each item, individually. It will need to calculate the hash codes of every item, it will need to find that item in the array, it may not be accessing the memory sequentially (resulting in more cache misses, given that the Dictionary is large enough to not fit in the cache), and there is more overhead of function class. Just to list a few things. Having said all of that, I too thought it would be a bit less than a 100% increase. – Servy Jul 17 '12 at 19:50
  • @Servy: I think the key is that you're effectively doing all the things you mentioned *on top of* (as opposed to *instead of*) iterating across the KeyValuePairs. – StriplingWarrior Jul 17 '12 at 20:07
  • 1
    I found this useful. I was looking for the answer to which was preferable, "foreach(kvp in Dict)" or "foreach(key in Dict.keys)", and I initially found this older question - http://stackoverflow.com/questions/141088/what-is-the-best-way-to-iterate-over-a-dictionary-in-c - but this discussion helped convince me 100%. – NateJ Aug 02 '12 at 22:50
  • I would be interested to read what your views are regarding the control. This appears to be 7.4 times faster than the first method (iterating over the key-value pairs). What does this tell us? I had expected that a dictionary would be storing its key-value pairs internally in an array and the hash-table would store the addresses of the array elements. Therefore shouldn't the first result be much closer to control result? Why is the for loop so much quick? – Ben Apr 14 '15 at 11:29
  • 2
    @Ben: First: Dictionaries are optimized for constant-time insertion *and deletion*, so they are probably not using an array to track insertion order: more likely they're using a linked-list structure, which has more overhead when iterating over it. Iterating over an array list takes about 3x as long as the control: a linked list is about 4x. Second: These are very fast operations, so every little operation makes a difference. The `pair.Value` actually accounts for the rest. If you create a `new LinkedList(dict)` and iterate over that in a third test, the results are similar. – StriplingWarrior Apr 14 '15 at 15:11
  • 1
    @Ben: Thinking about this further, it also occurs to me that `KeyValuePair` is a struct, and so requires its contents to be copied when passed around. The `pair.Value` is actually negligible (in fact, it might be optimized away entirely), but there's a substantial difference between iterating a `new LinkedList(dict.Values)` versus `new LinkedList(dict)`. Thanks for the thought-provoking question. – StriplingWarrior Apr 14 '15 at 15:28
  • Thanks for this. Is it worth adding to your answer? I am interested in this because I am trying to improve the performance of a container class. Currently it contains a dictionary but it seems to be primarily used with foreach loops (like your 'first' test) and is rarely being used to actually look-up a key. To be honest I was surprised to see such a large difference in performance, and thanks to your advice it looks like an array would be a much better choice for the internal storage of this class. – Ben Apr 14 '15 at 15:46