Add vs Addrange with LINQ / IEnumerable

Question

Back in the days, I was always told to use AddRange whenever possible because of the performance gain over Add. Today, I wanted to validate this statement prior to using it with the output of a List<T>.Select(x => ...) call but from the decompiled code of List<T>, I am under the impression the foreach {add} loop should be faster.

The main reasons I see are the numerous additional checks that are done in the process since it is not an ICollection

null-check on the new items collection
boundary check on the index (since AddRange is in fact a call to InsertRange with index _size
try type cast to ICollection
type check to know if it is an ICollection (which it isn't after the select call)
another boundary check when calling Insert
capacity check (this one also exists in Add)
position check (since Insert can also put data prior or within the actual list)

Has anyone ever done reliable benchmarking on this?

Edit: Added Code samples of the two options

var clientsConfirmations = new List<ClientConfirmation>();

foreach (var client in Clients)
{
    var clientConf = new ClientConfirmation(client.Id, client.Name,
        client.HasFlag ?? false);
    clientsConfirmations.Add(clientConf);
}

Versus

var clientsConfirmations = new List<ClientConfirmation>();

clientsConfirmations.AddRange(
    Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
    )
);

May we see the code in question before compilation and after decompilation? How many elements are you adding? Have you done a reliable benchmark of this? See also [Why is AddRange faster than using a foreach loop?](https://stackoverflow.com/q/9836471/150605) — Lance U. Matthews, Sep 20 '22 at 21:51
Could you include in the question examples of the two alternative approaches that you want to compare? — Theodor Zoulias, Sep 20 '22 at 22:00
I would argue that this is too broad question. There are multiple possibilities that needs to be accounted and measured. For example adding items by one from `ICollection` can lead to multiple list resizes (while `AddRange` should perform at most one) which should be much costlier than those boundaries checks. — Guru Stron, Sep 20 '22 at 22:18
Sorry, since the question was at my sense a more generic/general question, I didn't thought it needed code samples. I added them. — bkqc, Sep 21 '22 at 14:19
You don't need AddRange as it will use the non optimized path. — Wouter, Sep 21 '22 at 14:58
What is the type of the `Clients` property? This information might be important, so please include it in the question. — Theodor Zoulias, Sep 21 '22 at 15:53

score 1 · Answer 1 · answered Sep 21 '22 at 15:40

AddRange is faster with ICollection<T> because it can check the size of any ICollection<T> argument and allocate a buffer large enough to hold all items from the start. This isn't the case in either example.

Right now, both options are equally slow, because both enumerate over an IEnumerable<T> of unknown size and add items to the buffer one by one. Every time the internal buffer is full, a new one is allocated with double the size, the old data is copied over, and the old buffer is orphaned and eventually garbage-collected.

The intial buffer holds 1 item. Adding a lot of items one by one ends up allocating log2(N) temporary buffers and taking up 2 times the memory the final buffer does.

This is the case even with the second snippet, which hides what's actually going on.

var clientsConfirmations = new List<ClientConfirmation>();

clientsConfirmations.AddRange(
    Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
    )
);

Is actually

var clientsConfirmations = new List<ClientConfirmation>();
IEnumerable<ClientConfirmation> results=Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false);
clientsConfirmations.AddRange(results);

A more readable (but equally slow) equivalent is :

var confirmations=Clients.Select(client => new ClientConfirmation(
                    client.Id, 
                    client.Name, 
                    client.HasFlag ?? false)
               .ToList();

ToList() will do what both snippets do - create a new List and add items one by one.

To get better performance the List<T>(int capacity) constructor should be used to create a list with a large enough initial buffer. This doesn't have to be accurate. Even a rough guess will reduce allocations and copies. Using 32 as the capacity will save 5 allocations:

var confirmations = new List<ClientConfirmation>(32);
var results=Clients.Select(client =>
        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false);
confirmations.AddRange(results);

Are you sure that the `Select` LINQ operator doesn't return an `IEnumerable` whose underlying type is also an `ICollection`? Check out these two code files to figure out: [Select.cs](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Linq/src/System/Linq/Select.cs), [Select.SpeedOpt.cs](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Linq/src/System/Linq/Select.SpeedOpt.cs). Even if this is not the case today, things might change in future .NET versions. — Theodor Zoulias, Sep 21 '22 at 16:06

score 0 · Answer 2 · answered Sep 20 '22 at 23:17

If you call somelist.AddRange(List<T>.Select(...)) it is not an ICollection<T> then it won't benefit from the optimized path where it can use a single resize and array.copy. However if it would be somelist.AddRange(List<T>.Select(...).ToList()) it might use it but you have to pay for the extra ToList() and that in total would still be worse because it creates an extra list which would be equivalent to calling AddRange on an empty list.

bkqc · Accepted Answer · 2022-09-21T17:28:00.773

I did a small test application to try to bench the two versions of the code. I'm not an expert at benchmarking so if you find problems in it, make me know.

I ran the bench (50k clients transformed 1k times 10 times so 50M iterations) and the results were

Iteration	First run	Add (ms)	AddRange (ms)	Delta
1	Add	25228	25385	157
2	AddRange	21561	24682	3121
3	Add	24182	25317	1135
4	AddRange	25647	24749	- 898
5	Add	23347	24508	1161
6	AddRange	22699	24416	1717
7	Add	25819	24491	-1328
8	AddRange	19830	22113	2283
9	Add	19376	19762	360
10	AddRange	25119	25220	101
Avg	-	23280.8	24064.3	783.5

I would say that in that specific context, Add is marginally faster than AddRange but still a clear winner and that unless it is performance critical code, it is a matter of preference / keeping the code semantically clean.

Benchmark Code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Benchmarks
{
    class Program
    {
        private class Client
        {
            public int Id { get; set; }
            public string Name { get; set; }
            public bool? HasFlag { get; set; }
        }

        private class ClientConfirmation
        {
            public int Id { get; set; }
            public string Name { get; set; }
            public bool Flag { get; set; }

            public ClientConfirmation(int id, string name, bool flag)
            {
                Id = id;
                Name = name;
                Flag = flag;
            }
        }

        static void Main(string[] args)
        {
            var rng = new Random();
            for (var i = 1; i <= 10; i+=2)
            {
                AddVsAddRange(rng, i);
            }
            _ = Console.Read();
        }

        private static void AddVsAddRange(Random rng, int iteration)
        {
            var clients = new List<Client>();
            for (var i = 0; i < 50000; i++)
            {
                clients.Add(new Client()
                {
                    Id = rng.Next(),
                    Name = Guid.NewGuid().ToString(),
                    HasFlag = rng.Next(0, 3) switch
                    {
                        0 => false,
                        1 => true,
                        2 => null
                    }
                });
            }

            Console.WriteLine($"| {iteration} | {Version1(clients)} | {Version2(clients)} |");
            var v2 = Version2(clients);
            var v1 = Version1(clients);
            Console.WriteLine($"| {iteration+1} | {v1} | {v2} |");
            clients.Clear();
        }

        private static long Version1(IEnumerable<Client> clients)
        {
            var sw = System.Diagnostics.Stopwatch.StartNew();
            var clientsConfirmations = new List<ClientConfirmation>();

            for (var i = 0; i < 1000; i++)
                foreach (var client in clients)
                {
                    bool flag = false;

                    if (client.HasFlag.HasValue)
                    {
                        flag = client.HasFlag.Value;
                    }

                    var clientConf = new ClientConfirmation(client.Id, client.Name, flag);
                    clientsConfirmations.Add(clientConf);
                }

            var total =  sw.ElapsedMilliseconds;
            clientsConfirmations.Clear();
            return total;
        }

        private static long Version2(IEnumerable<Client> clients)
        {
            var sw = System.Diagnostics.Stopwatch.StartNew();
            var clientsConfirmations = new List<ClientConfirmation>();

            for (var i = 0; i < 1000; i++)
                clientsConfirmations.AddRange(
                    clients.Select(client =>
                        new ClientConfirmation(client.Id, client.Name, client.HasFlag ?? false)
                    )
                );

            var total = sw.ElapsedMilliseconds;
            clientsConfirmations.Clear();
            return total;
        }
    }
}

Edit: Create list with the right size

As @PanagiotisKanavos mentionned, setting the list size at creation will prevent the code from doing multiple allocations to fit the data. I modified the bench to take it into account. var clientsConfirmations = new List<ClientConfirmation>(clients.Count()); Here are the results:

Iteration	First run	Add (ms)	AddRange (ms)	Delta
1	Add	24150	23104	-1046
2	AddRange	24602	20391	-4211
3	Add	25723	25510	- 213
4	AddRange	23510	23041	- 469
5	Add	22621	20287	-2334
6	AddRange	21417	22995	1578
7	Add	23371	25797	2426
8	AddRange	24457	24695	238
9	Add	24980	24686	- 294
10	AddRange	25486	24038	-1448
Avg	Add	24031.7	23454.4	- 577.3

In that case, the trend is reversed and AddRange gets the win but I can't figure out why...

`var v2 = Version2(clients); var v1 = Version1(clients);` -- Could you rerun your benchmark with the two versions invoked in the reverse order, and see if the result about who is clear winner still holds? — Theodor Zoulias, Sep 21 '22 at 16:23
If you look closely, it is precisely the reason of this second call. The first one invokes the methods in the v1, v2 order. — bkqc, Sep 21 '22 at 17:18
I mean, call them in the reverse order: `var v1 = Version1(clients); var v2 = Version2(clients);`, and see what happens. What you declared as clear win might be just an artifact of how you did the benchmark. Differences that are so marginal are not reliable. — Theodor Zoulias, Sep 21 '22 at 17:34
Are you saying that this line does not do that? `Console.WriteLine($"| {iteration} | {Version1(clients)} | {Version2(clients)} |");` Also, in the table, you can notice that AddRange rows don't all end up with the same call as the winner. — bkqc, Sep 21 '22 at 17:37
What I want to say is that a 3% difference in a home-made benchmark proves practically nothing. Small differences on how you do the benchmark might swing the result in either direction. Don't put too much trust on those numbers. If you need to know with certainty the best between options that are so close, use the [BenchmarkDotNet](https://github.com/dotnet/BenchmarkDotNet) library. — Theodor Zoulias, Sep 21 '22 at 17:52

Add vs Addrange with LINQ / IEnumerable

3 Answers3