4

I'm working with a framework that uses collections derived from System.Collections.CollectionBase. Users have been complaining about performance, and I feel that these collections, which are very heavily used, might be a big part of the problem. Is there a way with a tool or profiler or in the IL that I can get some metrics on boxing/unboxing penalties? I need evidence to back-up a push for System.Collections.Generic instead. I've tried CLRProfiler, but tend to get lost and am not sure what I should be looking for.

UPDATE
Thanks all for your input so far. I am aware that this is probably not the main bottleneck, but am looking for metrics on as many possible performance killers as I can. This is just one of them, not sure how big it is, hence looking for a way to measure it.

AJ.
  • 16,368
  • 20
  • 95
  • 150
  • I am very sure the performance problem won't be which collection class you use. Read a few tutorials about profiling and test it. You can also take methods of your code and execute them on their own to test how long they take for 1,10,100,100000 elements and test if it really matters. –  Nov 11 '10 at 14:37
  • Personally I doubt that this is the bottleneck; in particular is it structs or classes in the collection? – Marc Gravell Nov 11 '10 at 14:39
  • Classes. Lots of them. Used very heavily. I'm aware of other issues with the framework (there are many, many issues), but I'm trying to gather as many metrics as possible to convince them to upgrade. – AJ. Nov 11 '10 at 14:51
  • @AJ: If your collections are of classes (reference types), then you don't even having a boxing issue to begin with. – Dan Tao Nov 11 '10 at 14:56

4 Answers4

10

While I certainly encourage you to move from non-generic to generic collections for plenty of good reasons, I honestly doubt that these collections could be the cause of your performance problems. Boxing is generally only an issue when you get to the microscopic level, needing to squeeze out tiny gains in high-performance situations. It's also good to avoid in general for GC reasons, but it's typically minor in that arena as well.

To put it another way: it's highly doubtful that boxing would cause a performance issue that your users would notice.

Obviously, I'm speaking in generalizations. Without knowing your specific scenario, I can't really say that much with any certainty.


Edit: Note that while I am skeptical your problem could be your use of non-generic collections per se, I will point out that it is very important what type of collection you use to tackle a given problem, particularly when the amount of data in the collection is large. Here are just a few examples:

  • If you are performing lookups based on a key, a hash table such as Dictionary<TKey, TValue> will significantly outperform a List<T>, for example.
  • If you are checking for duplicates, a HashSet<T> will have superior performance.
  • If you are looking for FIFO (queue-like) behavior, a Queue<T> will have superior performance.
  • If you are performing insertions/removals at random positions within the collection, a LinkedList<T> will have superior performance.

These collections should be part of any .NET developer's (really, any developer's) set of tools. If you find yourself using a List<T> (or ArrayList) or similar data structure everywhere you utilize collections of items, that very well may cause a performance issue down the road--again, particularly when your collections are large. These are not trivial performance gains I'm talking about. So do take care to make sensible choices for your collection types.


But I'd recommend a performance profiler in general, such as ANTS (good, but not free) or EQATEC (also good and free). Just run your application under a program such as one of these and see where your bottlenecks are. My guess is that you'll find it isn't with your non-generic collections; but naturally, I could be wrong.

Dan Tao
  • 125,917
  • 54
  • 300
  • 447
  • Right... chances are the performance over which collection is being used is minor in comparison to other areas like data access, or network latency. – Josh Nov 11 '10 at 14:42
  • Playing with EQATEC now. Thanks much for your detailed response. – AJ. Nov 11 '10 at 15:14
  • 1
    OMG @Dan Tao, I would +23 this if I could. Used EQATEC to find the smoking gun, and you were right, it has nothing to do with boxing and everything to do with data access. Thank you thank you thank you. – AJ. Nov 11 '10 at 17:59
2

Why not just set up a quick console app to measure the speeds of various operations. You could use a simple method like this:

private TimeSpan TimedAction(Action action)
{
    var timer = new Stopwatch();

    timer.Start();

    action.Invoke();

    timer.Stop();

    return timer.Elapsed;
}

And call it like this:

var elapsed = TimedAction(() =>
    {
        //Do some stuff with your collection here
    });

Console.WriteLine("Elapsed Time: {0}", elapsed.TotalMilliseconds);

You should be able to collect enough empirical evidence from this to figure out which collection would be faster given analogous operations. Number of items, number of contiguous operations performed, etc...

However, as Dan mentioned above; the amount of overall performance spent on the third party collection is probably insignificant when juxtapose to data access and network latency.

Community
  • 1
  • 1
Josh
  • 44,706
  • 7
  • 102
  • 124
1

Here is the proof you need.

From MSDN:

In addition to type safety, generic collection types generally perform better for storing and manipulating value types because there is no need to box the value types.

Take note though that in real life, generics are not actually as fast as Microsoft say. The difference is negligible.

Joebone
  • 550
  • 4
  • 16
Liviu Mandras
  • 6,540
  • 2
  • 41
  • 65
1

@Dan Tao's remarks are right on the money.

What I find myself doing a lot, in similar circumstances, is this technique, which you can do under any IDE.

So I know you want to measure a specific thing, but overall, your big concern is finding performance problems wherever they are, right?

We have debates about issues like these, but what the program is really spending time on is unrelated to that. Things like going 30 layers deep into subterranean libraries, just to do things like extract strings from resources so they can be translated into different languages, when they don't, in fact, need to be. Things like somebody sets a property to True, which sets off a chain of notifications with things getting added or removed from lists, tree-view controls being updated, windows being created and destroyed, tabs and menu items being added/removed, etc. Then a little while later, the property gets set to False again, as if it's no big deal. Things like setting cells in a grid control, where a similar tidal wave of ramifications ensue. Generally, tails wagging dogs.

That's what I mean about what's really going on. When things like boxing are a problem, samples will show it.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135