2

I'm trying to speed up a loop, which is cloning 2_500_000 objects. The clone itself is taking 800 ms for the entire loop but when I'm adding them to a list, it's taking 3 seconds..

List<T> list = new List<T>();

Stopwatch sw = new Stopwatch();
sw.Start();

foreach(T entity in listSource)
{
    T entityCloned = GetEntityClone(entity); // Taking 800ms totally

    if (entityCloned != null)
        list.Add(entityCloned);
}

sw.Stop();

Could you please help me to find out why those Adds are taking so much time?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
  • 3
    Try `list.AddRange(listSource.Select(s => GetEntityClone(s)).Where(s => s != null));` – FCin Sep 26 '18 at 10:51
  • If the only way for a clone to be `null` is that the original itself is `null`, then you might want to write `listSource.Where(s => s != null).Select(s => GetEntityClone(s))` instead (filter before cloning) – Rafalon Sep 26 '18 at 10:53
  • 6
    `List list = new List(2500000);` in order to prevent memory reallocation – Dmitry Bychenko Sep 26 '18 at 11:00
  • I just tried AddRange with Linq and setting the internal capacity of the list but it's not changing at all :'( – Julien Dinis Sep 26 '18 at 11:03
  • O(N) access to list is guaranteed by expanding [0, 4, 8, 16, 32, 64, 128, ...]. Reallocating takes time which is why you get theese "extreme" timings – X39 Sep 26 '18 at 11:05
  • 1
    What is T? A class, struct? Classes are passed by reference. Inserting the Nth reference into a List shouldn't take long. Structs though will have to be copied. Inserting a struct into a list requires making a copy of the struct returned by `GetEntityClone()`. Copying large structs takes time – Panagiotis Kanavos Sep 26 '18 at 11:15
  • @Panagiotis Kanavos: it seems that `T` is *class*: `if (entityCloned != null)` is of no use in case of `struct` – Dmitry Bychenko Sep 26 '18 at 11:27
  • @DmitryBychenko there's no info so we all guess. We can't even know how long cloning really takes, what the actual code looks like, how it performed *after* adding the capacity. – Panagiotis Kanavos Sep 26 '18 at 11:32
  • @JulienDinis use BenchmarkDotNet instead of trying to measure times using Stopwatch. Right now you don't know if your timings are affected by other code, the test harness, the garbage collector or reallocations. BenchmarkDotNet collects enough samples to ensure the times are correct and captures GCs, allocations/deallocations etc. Without your actual code people can only guess - setting a reference inside a preallocated list is *not* slow. We'd need access to the actual code to test it and profile it with BenchmarkDotNet – Panagiotis Kanavos Sep 26 '18 at 11:33
  • 1
    How about simply `new List(listSource.Count)` ? If 33% of the performance can be shaved off by a static array, giving the list an initial capacity would do the same thing. – Lasse V. Karlsen Sep 26 '18 at 12:29
  • And the main reason why a list is taking a lot of time to use in this manner might be because of garbage collection. On its way to 2.5m items, a total of 21 reallocations have been done on that list, moving a total of 4.1m items from one array to another, getting rid of these allocations and memory movement by specifying the capacity to begin with will help a lot. – Lasse V. Karlsen Sep 26 '18 at 12:32
  • I already tried that, it's not working :/ – Julien Dinis Sep 26 '18 at 13:12

3 Answers3

2

Unfortunately looping over many things and deep copying objects is going to take time. I don't think that 3 seconds is necessarily an unreasonable amount of time for it to take.

But you can potentially improve the speed.

Firstly, if you know how many items the result list will need to hold before hand you can set the internal capacity beforehand to prevent the list from having to resize. Resizing is an expensive activity which can be avoided if necessary. This can be done by manually changing the capacity property of the list or by passing the capacity as a constructor argument for the list.

Once the capacity has been allocated, the complexity of adding to the list should be O(1), no reaollcations (which are a O(n) complexity task see this answer) will be needed. It is unlikely that adding to the list in this case will be the bottleneck.

You could also remove null values from the initial list which you are copying from beforehand to remove the need for the if statement which has to be evaluated every time. Using linq:

var noNulls = listSource.where(o => o != null)

Michael Hancock
  • 2,673
  • 1
  • 18
  • 37
  • Unfortunately, i can't filter null values because my clone method is filtering by itself, I already tried to set the internal capacity by giving the size of my source list but it's not changing at all.. – Julien Dinis Sep 26 '18 at 11:02
  • 4
    It might be useful to add your clone method to the question so we can see what is happening in there. Would you be able to do this? – Michael Hancock Sep 26 '18 at 11:03
  • OP mentioned that deep copying does not take significant amount of time, but adding to a list does. One may argue that OP made mistake and did not measure properly, but this should be investigated before going directly to deep cloning issue. – Michal B. Sep 26 '18 at 13:01
  • 1
    Good point. I do think that adding to the list is unlikely to be the bottleneck though, especially if OP has tried adding a capacity. – Michael Hancock Sep 26 '18 at 13:31
0

I saved some time (around 33%) by using array instead of list:

MyObject class defitinion:

public class MyObject
{
    public int Id { get; set; }
    public bool Flag { get; set; }

    public static MyObject GetEntityClone(MyObject obj)
    {
        if (obj == null) return null;

        var newObj = new MyObject()
        {
            Id = obj.Id,
            Flag = obj.Flag
        };

        return newObj;
    }
}

Code:

var sourceList = new List<MyObject>();

// let's mock the source data, every 27th element will be null
for (int i = 0; i < 2500000; ++i)
{
    if (i % 27 != 0)
        sourceList.Add(new MyObject { Id = i, Flag = (i % 2 == 0) });
}

var destArray = new MyObject[2500000];

Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine(sw.ElapsedMilliseconds);

var currentElement = 0;
for (int i = 0; i < sourceList.Count; ++i)
{
    MyObject entityCloned = MyObject.GetEntityClone(sourceList[i]);

    if (entityCloned != null)
        destArray[currentElement++] = entityCloned;
}

var result = new MyObject[currentElement];
Array.Copy(destArray, 0, result, 0, currentElement);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Michal B.
  • 5,676
  • 6
  • 42
  • 70
0

Try the following do the work in parallel:

ConcurrentBag<T> list = new ConcurrentBag<T>();
Parallel.ForEach(listSource, entity =>
{
    T entityCloned = GetEntityClone(entity); //Taking 800ms totally
    if (entityCloned != null)
        list.Add(entityCloned);

});

var listVersion = list.ToList();