-1

Back in the days, I was always told to use AddRange whenever possible because of the performance gain over Add. Today, I wanted to validate this statement prior to using it with the output of a List<T>.Select(x => ...) call but from the decompiled code of List<T>, I am under the impression the foreach {add} loop should be faster.

The main reasons I see are the numerous additional checks that are done in the process since it is not an ICollection

  • null-check on the new items collection
  • boundary check on the index (since AddRange is in fact a call to InsertRange with index _size
  • try type cast to ICollection
  • type check to know if it is an ICollection (which it isn't after the select call)
  • another boundary check when calling Insert
  • capacity check (this one also exists in Add)
  • position check (since Insert can also put data prior or within the actual list)

Has anyone ever done reliable benchmarking on this?


Edit: Added Code samples of the two options

var clientsConfirmations = new List<ClientConfirmation>();

foreach (var client in Clients)
{
    var clientConf = new ClientConfirmation(client.Id, client.Name,
        client.HasFlag ?? false);
    clientsConfirmations.Add(clientConf);
}

Versus

var clientsConfirmations = new List<ClientConfirmation>();

clientsConfirmations.AddRange(
    Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
    )
);
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
bkqc
  • 831
  • 6
  • 25
  • May we see the code in question before compilation and after decompilation? How many elements are you adding? Have you done a reliable benchmark of this? See also [Why is AddRange faster than using a foreach loop?](https://stackoverflow.com/q/9836471/150605) – Lance U. Matthews Sep 20 '22 at 21:51
  • Could you include in the question examples of the two alternative approaches that you want to compare? – Theodor Zoulias Sep 20 '22 at 22:00
  • I would argue that this is too broad question. There are multiple possibilities that needs to be accounted and measured. For example adding items by one from `ICollection` can lead to multiple list resizes (while `AddRange` should perform at most one) which should be much costlier than those boundaries checks. – Guru Stron Sep 20 '22 at 22:18
  • Sorry, since the question was at my sense a more generic/general question, I didn't thought it needed code samples. I added them. – bkqc Sep 21 '22 at 14:19
  • You don't need AddRange as it will use the non optimized path. – Wouter Sep 21 '22 at 14:58
  • What is the type of the `Clients` property? This information might be important, so please include it in the question. – Theodor Zoulias Sep 21 '22 at 15:53

3 Answers3

1

AddRange is faster with ICollection<T> because it can check the size of any ICollection<T> argument and allocate a buffer large enough to hold all items from the start. This isn't the case in either example.

Right now, both options are equally slow, because both enumerate over an IEnumerable<T> of unknown size and add items to the buffer one by one. Every time the internal buffer is full, a new one is allocated with double the size, the old data is copied over, and the old buffer is orphaned and eventually garbage-collected.

The intial buffer holds 1 item. Adding a lot of items one by one ends up allocating log2(N) temporary buffers and taking up 2 times the memory the final buffer does.

This is the case even with the second snippet, which hides what's actually going on.

var clientsConfirmations = new List<ClientConfirmation>();

clientsConfirmations.AddRange(
    Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
    )
);

Is actually

var clientsConfirmations = new List<ClientConfirmation>();
IEnumerable<ClientConfirmation> results=Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false);
clientsConfirmations.AddRange(results);

A more readable (but equally slow) equivalent is :

var confirmations=Clients.Select(client => new ClientConfirmation(
                    client.Id, 
                    client.Name, 
                    client.HasFlag ?? false)
               .ToList();

ToList() will do what both snippets do - create a new List and add items one by one.

To get better performance the List<T>(int capacity) constructor should be used to create a list with a large enough initial buffer. This doesn't have to be accurate. Even a rough guess will reduce allocations and copies. Using 32 as the capacity will save 5 allocations:

var confirmations = new List<ClientConfirmation>(32);
var results=Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false);
confirmations.AddRange(results);
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • Are you sure that the `Select` LINQ operator doesn't return an `IEnumerable` whose underlying type is also an `ICollection`? Check out these two code files to figure out: [Select.cs](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Linq/src/System/Linq/Select.cs), [Select.SpeedOpt.cs](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Linq/src/System/Linq/Select.SpeedOpt.cs). Even if this is not the case today, things might change in future .NET versions. – Theodor Zoulias Sep 21 '22 at 16:06
0

If you call somelist.AddRange(List<T>.Select(...)) it is not an ICollection<T> then it won't benefit from the optimized path where it can use a single resize and array.copy. However if it would be somelist.AddRange(List<T>.Select(...).ToList()) it might use it but you have to pay for the extra ToList() and that in total would still be worse because it creates an extra list which would be equivalent to calling AddRange on an empty list.

Wouter
  • 2,540
  • 19
  • 31
0

I did a small test application to try to bench the two versions of the code. I'm not an expert at benchmarking so if you find problems in it, make me know.

I ran the bench (50k clients transformed 1k times 10 times so 50M iterations) and the results were

Iteration First run Add (ms) AddRange (ms) Delta
1 Add 25228 25385 157
2 AddRange 21561 24682 3121
3 Add 24182 25317 1135
4 AddRange 25647 24749 - 898
5 Add 23347 24508 1161
6 AddRange 22699 24416 1717
7 Add 25819 24491 -1328
8 AddRange 19830 22113 2283
9 Add 19376 19762 360
10 AddRange 25119 25220 101
Avg - 23280.8 24064.3 783.5

I would say that in that specific context, Add is marginally faster than AddRange but still a clear winner and that unless it is performance critical code, it is a matter of preference / keeping the code semantically clean.


Benchmark Code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Benchmarks
{
    class Program
    {
        private class Client
        {
            public int Id { get; set; }
            public string Name { get; set; }
            public bool? HasFlag { get; set; }
        }

        private class ClientConfirmation
        {
            public int Id { get; set; }
            public string Name { get; set; }
            public bool Flag { get; set; }

            public ClientConfirmation(int id, string name, bool flag)
            {
                Id = id;
                Name = name;
                Flag = flag;
            }
        }

        static void Main(string[] args)
        {
            var rng = new Random();
            for (var i = 1; i <= 10; i+=2)
            {
                AddVsAddRange(rng, i);
            }
            _ = Console.Read();
        }

        private static void AddVsAddRange(Random rng, int iteration)
        {
            var clients = new List<Client>();
            for (var i = 0; i < 50000; i++)
            {
                clients.Add(new Client()
                {
                    Id = rng.Next(),
                    Name = Guid.NewGuid().ToString(),
                    HasFlag = rng.Next(0, 3) switch
                    {
                        0 => false,
                        1 => true,
                        2 => null
                    }
                });
            }

            Console.WriteLine($"| {iteration} | {Version1(clients)} | {Version2(clients)} |");
            var v2 = Version2(clients);
            var v1 = Version1(clients);
            Console.WriteLine($"| {iteration+1} | {v1} | {v2} |");
            clients.Clear();
        }

        private static long Version1(IEnumerable<Client> clients)
        {
            var sw = System.Diagnostics.Stopwatch.StartNew();
            var clientsConfirmations = new List<ClientConfirmation>();

            for (var i = 0; i < 1000; i++)
                foreach (var client in clients)
                {
                    bool flag = false;

                    if (client.HasFlag.HasValue)
                    {
                        flag = client.HasFlag.Value;
                    }

                    var clientConf = new ClientConfirmation(client.Id, client.Name, flag);
                    clientsConfirmations.Add(clientConf);
                }

            var total =  sw.ElapsedMilliseconds;
            clientsConfirmations.Clear();
            return total;
        }

        private static long Version2(IEnumerable<Client> clients)
        {
            var sw = System.Diagnostics.Stopwatch.StartNew();
            var clientsConfirmations = new List<ClientConfirmation>();

            for (var i = 0; i < 1000; i++)
                clientsConfirmations.AddRange(
                    clients.Select(client =>
                        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
                    )
                );

            var total = sw.ElapsedMilliseconds;
            clientsConfirmations.Clear();
            return total;
        }
    }
}


Edit: Create list with the right size

As @PanagiotisKanavos mentionned, setting the list size at creation will prevent the code from doing multiple allocations to fit the data. I modified the bench to take it into account. var clientsConfirmations = new List<ClientConfirmation>(clients.Count()); Here are the results:

Iteration First run Add (ms) AddRange (ms) Delta
1 Add 24150 23104 -1046
2 AddRange 24602 20391 -4211
3 Add 25723 25510 - 213
4 AddRange 23510 23041 - 469
5 Add 22621 20287 -2334
6 AddRange 21417 22995 1578
7 Add 23371 25797 2426
8 AddRange 24457 24695 238
9 Add 24980 24686 - 294
10 AddRange 25486 24038 -1448
Avg Add 24031.7 23454.4 - 577.3

In that case, the trend is reversed and AddRange gets the win but I can't figure out why...

bkqc
  • 831
  • 6
  • 25
  • `var v2 = Version2(clients); var v1 = Version1(clients);` -- Could you rerun your benchmark with the two versions invoked in the reverse order, and see if the result about who is clear winner still holds? – Theodor Zoulias Sep 21 '22 at 16:23
  • If you look closely, it is precisely the reason of this second call. The first one invokes the methods in the v1, v2 order. – bkqc Sep 21 '22 at 17:18
  • I mean, call them in the reverse order: `var v1 = Version1(clients); var v2 = Version2(clients);`, and see what happens. What you declared as clear win might be just an artifact of how you did the benchmark. Differences that are so marginal are not reliable. – Theodor Zoulias Sep 21 '22 at 17:34
  • Are you saying that this line does not do that? `Console.WriteLine($"| {iteration} | {Version1(clients)} | {Version2(clients)} |");` Also, in the table, you can notice that AddRange rows don't all end up with the same call as the winner. – bkqc Sep 21 '22 at 17:37
  • What I want to say is that a 3% difference in a home-made benchmark proves practically nothing. Small differences on how you do the benchmark might swing the result in either direction. Don't put too much trust on those numbers. If you need to know with certainty the best between options that are so close, use the [BenchmarkDotNet](https://github.com/dotnet/BenchmarkDotNet) library. – Theodor Zoulias Sep 21 '22 at 17:52