1

I have a situation where I have a large collection of items stored in HttpApplicationState which internally uses a NameValueCollection to store key-value pairs. By large I mean in the order of hundreds of thousands of string items. In this particular scenario I am also trying to do a batch removal of keys (again, removing large chunks of items by key from the collection) but I am finding that is painfully slow to do.

I wrote the following samples to compare. The first code sample uses a NameValueCollection to remove all values by key:

NameValueCollection collection = new NameValueCollection();

// Setup
for (int i = 0; i < 100000; i++)
{
    collection.Add(i.ToString(), i.ToString());
}

// Remove
for (int i = 0; i < 100000; i++)
{
    collection.Remove(i.ToString());
}

Running this takes an age (in fact I gave up because it took too long). I then compared it with this version which uses a Dictionary<TKey, TValue>:

Dictionary<int, int> collection = new Dictionary<int, int>();

// Setup
for (int i = 0; i < 100000; i++)
{
    collection.Add(i, i);
}

// Remove
for (int i = 0; i < 100000; i++)
{
    collection.Remove(i);
}

The above sample runs so fast it might as well be instant.

So why do two different collections that to me look to do similar things work so differently?

Peter Monks
  • 4,219
  • 2
  • 22
  • 38

1 Answers1

2

Thanks to the BCL Reference Source I was able to determine why the NameValueCollection.Remove() method was taking so long. The following is a code snippet of the NameObjectCollectionBase.BaseRemove() method which is invoked:

if (name != null) {
    // remove from hashtable
    _entriesTable.Remove(name);

    // remove from array
    for (int i = _entriesArray.Count-1; i >= 0; i--) {
        if (_keyComparer.Equals(name, BaseGetKey(i)))
            _entriesArray.RemoveAt(i);
    }
}

Basically a Dictionary<TKey, TValue> works as a hash table which means that lookup by key is extremely fast. Whereas a NameValueCollection appears to work more like an array where indexes and keys are tracked. By removing hundreds of thousands of keys at a time this method will in fact loop through the entire internal array countless times to find the correct value to remove!

In the end I changed my code to not use a NameValueCollection and instead used a Dictionary<TKey, TValue> instead because of this.

Peter Monks
  • 4,219
  • 2
  • 22
  • 38
  • You've added an answer in the same minute you asked a question. What was the purpose of asking in first place, if you already had an answer ready? – Tarec May 06 '14 at 16:28
  • +1. @Tarec - I think it is quite reasonable question/answer - there similar question like [NVC vs Dictionary](http://stackoverflow.com/questions/3001108/namevaluecollection-vs-dictionarystring-string), but none goes into perf part. It is ok to answer own question, even accept it if there is no better answers... - also I'd put answer as "community wiki" to be less contentions. – Alexei Levenkov May 06 '14 at 16:37
  • I'm not against marking it as an answer. I'm against doing it in the same minute the question's been posted and him pretending to have a problem which in fact he's already solved. And on question "why is method x faster than method y" you could always answer: decompile and compare. – Tarec May 06 '14 at 17:57
  • 2
    Because according to StackOverflow [it is OK to answer your own question](http://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/) - why else is there a check-box for such a thing in the question form? I was simply providing this for information for others because this caused me issues which may help others. – Peter Monks May 07 '14 at 09:30
  • @AlexeiLevenkov I didn't think to mark the answer as Community Wiki as I've never done it before. Happy to do this – Peter Monks May 07 '14 at 09:45