0

I have a huge legacy code base and I would like to optimize it, make it faster. For this reason I thought about looking up opportunities where I can replace list and arrays with HashSets and Dictionaries.

There is the following NDepend query under .NET Framework Usage / System.collection

// <Name>Caution with List.Contains()</Name>
let containsMethods = ThirdParty.Methods.WithFullNameIn(
   "System.Collections.Generic.List<T>.Contains(T)",
   "System.Collections.Generic.IList<T>.Contains(T)",
   "System.Collections.ArrayList.Contains(Object)")

from m in Application.Methods.UsingAny(containsMethods) 
select m

This query is not enough. It will list one function with the following code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ListOptimisation
{
    class Program
    {
        static void Main(string[] args)
        {
            int aLength = 10000;
            List<int> aNumbers2Search = Enumerable.Range(0, aLength).ToList();

            List<int> aTestList = Enumerable.Range(0, aLength).ToList();
            int[] aTestArray = Enumerable.Range(0, aLength).ToArray();

            HashSet<int> aTestHash = new HashSet<int>(Enumerable.Range(0, aLength));
            Dictionary<int, int> aTestDictionary = new Dictionary<int, int>();
            for(int i = 0; i < aLength; ++i)
            {
                aTestDictionary.Add(i, i);
            }

            Search(aTestList, aNumbers2Search);
            SearchIList(aTestList, aNumbers2Search);
            SearchIEnumerable(aTestList, aNumbers2Search);
            Search(aTestArray, aNumbers2Search);
            SearchIList(aTestArray, aNumbers2Search);
            SearchIEnumerable(aTestArray, aNumbers2Search);
            Search(aTestHash, aNumbers2Search);
            SearchIEnumerable(aTestHash, aNumbers2Search);
            Search(aTestDictionary, aNumbers2Search);
        }

        private static void Search(List<int> testList_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testList_in.Contains(x));
        }

        private static void Search(HashSet<int> testHash_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testHash_in.Contains(x));
        }

        private static void Search(Dictionary<int, int> testDictionary_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testDictionary_in.ContainsKey(x));
        }

        private static void Search(int[] testArray_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testArray_in.Contains(x));
        }

        private static void SearchIList(IList<int> testIList_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testIList_in.Contains(x));
        }

        private static void SearchIEnumerable(IEnumerable<int> testIEnumerable_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testIEnumerable_in.Contains(x));
        }
    }
}

A better query would be this one:

// <Name>Caution with List style contains</Name>
let containsMethods = ThirdParty.Methods.WithSimpleName("Contains").Except(ThirdParty.Methods.WithFullNameIn("System.Collections.Generic.HashSet<T>.Contains(T)"))

from m in Application.Methods.UsingAny(containsMethods) 
select m

//<Description>
// Alternative to Caution with List.Contains()
//</Description>

This will list 4 functions (List, IList, int[], IEnumerable). I am newbie regarding CQLinq. My questions are:

  • Does any one can write better query to detect possible bad .NET container usages (not just for contains, but for other possible operations)?
  • How do or would you detect bad container usage?

A last comment, some our business logic handles a lot of data so having correct containers, data structures and algorithms counts.

Laszlo
  • 302
  • 1
  • 10

2 Answers2

1

This is not a good approach to optimize performance problems. This optimization will have minor impact on your system unless you are dealing with huge lists.

You'll have better results using performance profiling software. If you want to performance by searching for some code pattern, try searching for nested loops and expensive code such as file and database related methods.

Community
  • 1
  • 1
lstern
  • 1,599
  • 14
  • 27
  • 1
    This is not micro-optimization, and we were able to pull huge performance gains in rewriting a couple of List<> handlings, O(n) ".Contains" time to HashSet and Dictionary O(1) ".Contains" time. – Laszlo Jul 22 '16 at 18:03
  • How are you measuring the performance improvement ? I know I'm not answering your question but you stated your problem (performance) and described a uncommon approach to solve this problem (and your query already catch all relevant contains IMO), Maybe you should edit your question providing more information about the system and how you are conducting this performance improvement process. – lstern Jul 22 '16 at 18:17
  • We have performance test that do business understandable scenarios and they log their running time. And yes we found a couple of bottlenecks with very long and tedious profiling and in the end it was bad container usages. We can not profile everything, the code base is huge (millions of line), it is legacy for me. – Laszlo Jul 22 '16 at 18:22
1

Indeed trying to replace List<T>.Contains() calls with Hashset<T>.Contains() calls is not a micro-optimization, and can dramatically improve performance. Actually refactoring an algorithm to rely on O(1) hashset search is, from my experience, one of the best thing to do to improve performance.

The CQLinq query you wrote is a first step to identify some potential slow points. However to start refactoring well, you must 1) review the code to assess the collection size at run-time and 2) use a performance profiling tool on real situation to assess if these potential slow points have an impact on performance and also, find others slow points not matched by the query.

Community
  • 1
  • 1
Patrick from NDepend team
  • 13,237
  • 6
  • 61
  • 92
  • Well we have business scenario performance tests, so I can validate my changes, and I can match the changed code to respective test, so that is not a problem. :-) My ultimate dream that someone would answer to my second question with collection of "well established" CQLinq queries called "Possible Performance bottlenecks". :-) – Laszlo Jul 26 '16 at 14:01
  • The query you wrote is right but won't be perfect. If a Hashset is created in a method and that ICollection.Contains() is called from another method on a ICollection object, NDepend won't be smart enough to guess that the call to Contains() is right, it is a static analyzer, not a dynamic analyzer ;-) – Patrick from NDepend team Jul 27 '16 at 14:49
  • 1
    Yes my query will give false positives, but it will include more "smells". And yes NDepends is not a dynamic analyzer, maybe I should put a tracer breakpoint on the .NET List.Contains... – Laszlo Jul 28 '16 at 08:52
  • Why not using a dynamic analyzer? – Patrick from NDepend team Jul 29 '16 at 18:59