23

I am working with a List<> collection, adding new objects to the collection inside 2 nested loops. There are some 500000 items added to the collection, after the loops finish to execute.

At first, the addition operation runs well, but soon after there can be noticed a decrease in performance, for the last thousands of elements, the delay time is unbearable.

I have tried various tricks (initializing the collection with a certain size - 500000), replacing the List<> with a LinkedList<> collection, but it didn't help too much.

Can you recommend me a tip to solve the problem? I am interesting in changing the structure with a more optimized one - LinkedList<> for instance performs better than List<> with operations such as addition.

Method which updates the list

   private void UpdateForecastList(ConcurrentDictionary<Int32, RegistroSalidaProductoPrevision> prediccion, bool soloMejoresMetodos = true)
   {
        foreach (KeyValuePair<int, RegistroSalidaProductoPrevision> kvp in prediccion)
        {
            KeyValuePair<int, RegistroSalidaProductoPrevision> localKvp = kvp;

            IList<Prediccion> pExistente = prediccionList.Where(p => p.Id == localKvp.Key).ToList();

            Articulo articulo = (articuloList.Where(a => a.Id == localKvp.Key)).First();

            if (pExistente.Count > 0)
            {
                foreach (var p in pExistente)
                {
                    prediccionList.Remove(p);
                }
            }

            if (kvp.Value.Previsiones.Count > 0)
            {
                var previsiones = kvp.Value.Previsiones.Where(prevision => prevision.Value.LPrevision[1] != null).ToList();
                int previsionesCount = previsiones.Count;

                for (int a = 0; a < previsionesCount; a++)
                {
                    var registros = previsiones[a].Value.LPrevision[1].Serie;
                    int c = registros.Count;

                    if (soloMejoresMetodos)
                    {
                        if (localKvp.Value.MejorMetodo != previsiones[a].Key) continue;
                        for (int i = 0; i < c; i++)
                        {
                            var p = new Prediccion()
                                        {
                                            Id = articulo.Id,
                                            Nombre = articulo.Codigo,
                                            Descripcion = articulo.Descripcion,
                                            NombreMetodo =
                                                Utils.SplitStringByCapitals(previsiones[a].Value.NombreMetodo),
                                            Fecha = registros[i].Fecha,
                                            PrediccionArticulo = Math.Round(registros[i].Cantidad, 2),
                                            EsMejorMetodo =
                                                (previsiones[a].Value.NombreMetodo == localKvp.Value.MejorMetodo)
                                                    ? true
                                                    : false
                                        };

                            // This line experiences performance loss
                            prediccionList.Add(p);
                        }
                    }
                    else
                    {
                        for (int i = 0; i < c; i++)
                        {
                            prediccionList.Add(new Prediccion()
                                                   {
                                                       Id = articulo.Id,
                                                       Nombre = articulo.Codigo,
                                                       Descripcion = articulo.Descripcion,
                                                       NombreMetodo = previsiones[a].Value.NombreMetodo,
                                                       Fecha = registros[i].Fecha,
                                                       PrediccionArticulo =
                                                           Math.Round(registros[i].Cantidad, 2),
                                                       EsMejorMetodo =
                                                           (previsiones[a].Value.NombreMetodo ==
                                                            localKvp.Value.MejorMetodo)
                                                               ? true
                                                               : false
                                                   });
                        }
                    }
                }
            }
            else
            {
                prediccionList.Add(new Prediccion()
                                       {
                                           Id = articulo.Id,
                                           Nombre = articulo.Codigo,
                                           Descripcion = articulo.Descripcion,
                                           NombreMetodo = kvp.Value.ErroresDatos[0].Texto,
                                       });
            }
        }
    }

Small description of the method: - the method reads an object (a concurrent dictionary) and updates a list (in this case a LinkedList) with the forecasts corresponding to a certain article.

The concurrent dictionary object is constantly updated from various threads that access it concurrently.

The list is initialized with null predictions corresponding to all the articles; thus, for instance, if you have 700 articles, in the beginning the list will be populated with 700 blank forecasts.

As the concurrent dictionary is updated by one of the computing threads, it raises an event which calls the method mentioned above, which at its turn, updates the list (prediccionList).

The maximum number of records which could be hold in the prediccionList (in this case) is about 500000 records, but the loss in performance could be noticed after adding some 40000 records in the list.

The code might seem a bit rusty, as I have tried various optimizations tricks (replace the foreach'es with for's, calculate the count's outside the loops, replace the List<> object with a LinkedList<> etc.). Finally I reached the conclusion that the part that slows down the execution time is the line "prediccionList.Add(p);".

The objects that are added to the list are instances of the Prediccion class; this object I consider not to be very heavy, it only contains 7 fields.

Memory usage

I attach the result from a memory profiling. The memory used doesn't surpass 256 MB, thus I don't believe the memory should be a problem here.enter image description here

wonea
  • 4,783
  • 17
  • 86
  • 139
Linus
  • 239
  • 1
  • 2
  • 4
  • Where is it getting the 500000 items from? – Ash Burlaczenko Jul 29 '11 at 10:48
  • 6
    Could you provide a code sample that reproduces the problem? – alun Jul 29 '11 at 10:49
  • What type of objects are being added? – jalf Jul 29 '11 at 10:51
  • 1
    Your slowdown probably is not the List. I've had Lists much longer than that without problems. Are your objects memory intensive, or use unmanaged ressources, or something like that? – Jens Jul 29 '11 at 10:55
  • Are you sure the `Add` operations are slowing things down. If you run the loops, without creating new objects (just add the same object where you would create a new one) do you get the same performance curve? – Binary Worrier Jul 29 '11 at 10:55
  • 1
    The LinkedList is optimized for insertion and will have a constant time to perform that operation. If you are experiencing decreases in performance utilizing that data structure, it is likely being caused by some kind of memory constraint. – Clayton Jul 29 '11 at 11:03
  • 1
    Although you say it didn't help, using the constructor which accepts a capacity is still excellent advice for a huge list. – Alex Humphrey Jul 29 '11 at 12:16
  • I appreciate this will be an exciting question for some, however, do you really need 500K items in your list? What is your use-case - would you not be better off querying a database perhaps? – Barry Kaye Jul 29 '11 at 10:49
  • this "saw" graphic of memory usage is probably showing your slowdowns (if slowdown is when usage is increasing), this should be the part where .net copies your "live" objects into new memory space, to release old. – Goran Obradovic Aug 01 '11 at 11:44

8 Answers8

16

The problem has nothing to do with the performance of List or any other .NET data structure. Your problem is purely algorithmic. For example, you have this code fragment:

    foreach (KeyValuePair<int, RegistroSalidaProductoPrevision> kvp in prediccion)
    {
        KeyValuePair<int, RegistroSalidaProductoPrevision> localKvp = kvp;

        IList<Prediccion> pExistente = prediccionList.Where(p => p.Id == localKvp.Key).ToList();

        Articulo articulo = (articuloList.Where(a => a.Id == localKvp.Key)).First();

So for every item in the dictionary (prediccion), you're iterating over the entire prediccionList. You've implemented an n^2 algorithm. The amount of time it takes to execute that method is proportional to prediccion.Count * prediccionList.Count.

You need a better algorithm; not a faster collection data structure.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
8

In my experience, the List<T> performance is memory-dependant. It always follows the same pattern, the inserting is fast up to a point, and then the performance sharply drops. On my machine that usually happens when I hit the 1.2G memory mark. I've had the same issue with almost all collections I've tried, so I think it's more of a .net underlying issue than a List<T> issue.

I would recommend to try to reduce the object size of the things you are using 500.000 of (replace longs with ints, etc...) and try then.
But beware, even if you manage it to work fast on your machine, it might be over the threshold of the machine where the app is deployed.

SWeko
  • 30,434
  • 10
  • 71
  • 106
6

If the objects that you're adding to the list are of any significant size, you could be suffering from memory constraints.

If your process is 32-bit, you're going to be constrained to 2GB in total before running out of address space, but if it's 64-bit, you could easily exceed the physical memory in the machine and start paging to disk.

How big are your objects?

Steve Morgan
  • 12,978
  • 2
  • 40
  • 49
  • I don't know how you calculated 2GB, but um, 32 bit is 4GB. – xxxxxxxxxadfas Jul 29 '11 at 12:47
  • 8
    32-bit processes are allocated 2GB of the 4GB address space. If processes are 'Large Address Aware', they can access 3GB on 32-bit Windows and 4GB on 64-bit Windows. 64-bit processes on 64-bit Windows can access 8TB. – Steve Morgan Jul 29 '11 at 12:55
6

As your list grows larger, every time when it is expanding collecting garbage, framework is copying its contents to a new list location, because of the way how garbage collector works. That is why it becomes slower and slower as it is becoming larger. (GC on MSDN)

Possible solutions (that I can think of) are using a list or array with predefined size that you are sure it wont fill up, or if that is not an option, then using System.Collections.Generic.LinkedList, but as you have already tried that, you may have to implement custom list, single-linked if applicable (LinkedList is double-linked).

To increase chance of getting good answer, you should post code of object you keep in collection, and part where you add items, so we can better understand what is all about.

Also, please take a look at http://www.simple-talk.com/dotnet/performance/the-top-5-.net-memory-management-misconceptions/, I think that it will be of help to you.

UPDATE: Indexing should be cheap operation, but nevertheless, you may try to read previsiones[a] (and registros[i] in nested loop) into local variable on beginning of loop, you will save couple of indexings (x 100000 iterations, could make some difference, if clr is not optimizing this?).

Goran Obradovic
  • 8,951
  • 9
  • 50
  • 79
  • >every time when it is expanding, framework is copying its contents to a new list this is a most incredible piece of fiction writing I ever read, since alice in wonderland. keep writing goran. – Boppity Bop Jun 08 '17 at 20:21
  • 1
    @BoppityBop fixed the offending word. You are right, someone who doesn't know how GC works could be misled to think it is every time an object is added to the list. – Goran Obradovic Jun 12 '17 at 15:41
3

Using a Struct instead of a Class can improve your performance significantly.

You can also gain performance by losing your String Properties from the Prediccion Class/Struct.

I was interested in the actual impact for a long time, so here is my Benchmark:

I took different data structures and put 20 Million objects/structs in them. Here is the result:

List:
Adding 20000000 TestClass to a List`1 took 3563,2068 ms
Accessing 20000000 TestClass from a List took 103,0203 ms
Adding 20000000 TestStruct to a List`1 took 2239,9639 ms
Accessing 20000000 TestStruct from a List took 254,3245 ms

Initialized List:
Adding 20000000 TestClass to a List`1 took 3774,772 ms
Accessing 20000000 TestClass from a List took 99,0548 ms
Adding 20000000 TestStruct to a List`1 took 1520,7765 ms
Accessing 20000000 TestStruct from a List took 257,5064 ms

LinkedList:
Adding 20000000 TestClass to a LinkedList`1 took 6085,6478 ms
Adding 20000000 TestStruct to a LinkedList`1 took 7771,2243 ms

HashSet:
Adding 20000000 TestClass to a HashSet`1 took 10816,8488 ms
Adding 20000000 TestStruct to a HashSet`1 took 3694,5187 ms

Now I added a string to the class/struct:
List:
Adding 20000000 TestClassWithString to a List`1 took 4925,1215 ms
Accessing 20000000 TestClassWithString from a List took 120,0348 ms
Adding 20000000 TestStructWithString to a List`1 took 3554,7463 ms
Accessing 20000000 TestStructWithString from a List took 456,3299 ms

This is my test program:

    static void Main(string[] args)
    {
        const int noObjects = 20*1000*1000;

        Console.WriteLine("List:");
        RunTest(new List<TestClass>(), noObjects);
        RunTest(new List<TestStruct>(), noObjects);
        Console.WriteLine();

        Console.WriteLine("Initialized List:");
        RunTest(new List<TestClass>(noObjects), noObjects);
        RunTest(new List<TestStruct>(noObjects), noObjects);
        Console.WriteLine();

        Console.WriteLine("LinkedList:");
        RunTest(new LinkedList<TestClass>(), noObjects);
        RunTest(new LinkedList<TestStruct>(), noObjects);
        Console.WriteLine();

        Console.WriteLine("HashSet:");
        RunTest(new HashSet<TestClass>(), noObjects);
        RunTest(new HashSet<TestStruct>(), noObjects);
        Console.WriteLine();

        Console.WriteLine("Now I added a string to the class/struct:");
        Console.WriteLine("List:");
        RunTest(new List<TestClassWithString>(), noObjects);
        RunTest(new List<TestStructWithString>(), noObjects);
        Console.WriteLine();

        Console.ReadLine();
    }




    private static void RunTest<T>(ICollection<T> collection, int noObjects) where T : ITestThing
    {
        Stopwatch sw = new Stopwatch();
        sw.Restart();
        for (int i = 0; i < noObjects; i++)
        {
            var obj = Activator.CreateInstance<T>();
            obj.Initialize();
            collection.Add(obj);
        }
        sw.Stop();
        Console.WriteLine("Adding " + noObjects + " " + typeof(T).Name + " to a " + collection.GetType().Name + " took " + sw.Elapsed.TotalMilliseconds + " ms");

        if (collection is IList)
        {
            IList list = (IList) collection;
            // access all list objects
            sw.Restart();
            for (int i = 0; i < noObjects; i++)
            {
                var obj = list[i];
            }
            sw.Stop();
            Console.WriteLine("Accessing " + noObjects + " " + typeof (T).Name + " from a List took " + sw.Elapsed.TotalMilliseconds + " ms");
        }
    }

TestClass and TestStruct both look like this (one with 'class', one with 'struct'):

public class TestClass : ITestThing
{
    public int I1;
    public int I2;
    public double D1;
    public double D2;
    public long L1;
    public long L2;

    public void Initialize()
    {
        D1 = 1;
        D2 = 2;
        I1 = 3;
        I2 = 4;
        L1 = 5;
        L2 = 6;
    }
}

Only TestStruct is public struct instead of public class and TestClassWithString and TestStructWithString public string S1 which is initialized with "abc".

ITestThing is just there because structs cannot have a Constructor, so I needed some way to call the Initialize() method in a generic way, but it turns out it doesn't make much of a difference if I call Initialize() or not.

Please note that the differences in the durations would be even more extreme if I had written the code plain for every test case without any Interface or Activator.CreateInstance, but that code grew too big as soon as I added the 2nd test case...

SUMMARY

You can greatly improve your performance by using a List with initial size and put Structs in it, not class instances (Objects). Also try to avoid having Strings in your Structs, because every String instance is again an object which you tried to avoid by using a Struct instead of an Object.

JCH2k
  • 3,361
  • 32
  • 25
2

How about using an array instead of a List? You could initialize it to have an initial size (let's say 500000 elements) and if that's not enough, use Array.Resize to add another 100000. You just have to keep track of the actual number of elements, as the Length property will only give you the number of elements.

Note, however, that the Array.Resize call may also be time consuming, as basically, a new array of the new size will be generated and all elements from the original array will be copied into the new array. You should not call this too often.

Thorsten Dittmar
  • 55,956
  • 8
  • 91
  • 139
  • 1
    How would that help? How is it different from initializing a List with the appropriate size, as he already does? – jalf Jul 29 '11 at 10:52
  • 1
    @jalf, i'm not sure on the exact specifics, but List underneath uses an array. I would imagine there is quite a bit of creating new arrays and copy the data across with so many elements involved. I may be completely wrong though :) – Darren Young Jul 29 '11 at 11:01
  • @jalf, I'd just assume that the implementation for `List<>` introduces some more overhead than a "plain array". I'm not saying that using `Array` is faster or better, I'm only suggesting to give it a try and see what happens. – Thorsten Dittmar Jul 29 '11 at 11:16
0

Have you tried giving a capacity on initialization. So it won't need to re-allocate memory and carry old contents to new memory space.

List<long> thelist = new List<long>(500000);
tesla
  • 1,219
  • 1
  • 10
  • 18
-2

You can use an array which will be faster (but not queryable). I don't know the specifics of your code but you may want to refractor and use a database. 500000 items will never be fast

wonea
  • 4,783
  • 17
  • 86
  • 139
Tom Squires
  • 8,848
  • 12
  • 46
  • 72