76

After reading this article I decided to take a closer look at the way I was using Dapper.

I ran this code on an empty database

var members = new List<Member>();
for (int i = 0; i < 50000; i++)
{
    members.Add(new Member()
    {
        Username = i.toString(),
        IsActive = true
    });
}

using (var scope = new TransactionScope())
{
    connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);

    scope.Complete();
}

it took about 20 seconds. That's 2500 inserts/second. Not bad, but not great either considering the blog was achieving 45k inserts/second. Is there a more efficient way to do this in Dapper?

Also, as a side note, running this code through the Visual Studio debugger took over 3 minutes! I figured the debugger would slow it down a little, but I was really surprised to see that much.

UPDATE

So this

using (var scope = new TransactionScope())
{
    connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);

    scope.Complete();
}

and this

    connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);

both took 20 seconds.

But this took 4 seconds!

SqlTransaction trans = connection.BeginTransaction();

connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members, transaction: trans);

trans.Commit();
vgru
  • 49,838
  • 16
  • 120
  • 201
kenwarner
  • 28,650
  • 28
  • 130
  • 173
  • 3
    Have you tried a DbTransaction (SqlTransaction)? That is a bit less overhead. Also, just for numbers: maybe try it without a transaction too, so we know what we are measuring. Finally, what does the 20s span? The inserts? The inserts+complete? Everything? Something else? – Marc Gravell May 21 '12 at 20:29
  • 20s is just the part encompassed by the using statement. I'll try the SqlTransaction – kenwarner May 21 '12 at 20:32
  • The different transaction type can't shave off more that a few milliseconds. It is a one-time cost, not proportional to the number of items. – usr May 21 '12 at 20:40
  • @usr all I'm saying is: it would be nice to be able to say thaT based on having the numbers – Marc Gravell May 21 '12 at 20:41
  • edited an update. I'm a little surprised at the result, but it is what it is. i've run the 3 approaches twice each with empty databases, same results each time. – kenwarner May 21 '12 at 20:46
  • Benchmark your SqlTransaction version against a raw SqlCommand/SqlTransaction one on your machine. You need to be comparing the apples to apples. Also do a warmup round of 1 insert prior to measuring anything. – Sam Saffron May 21 '12 at 23:34
  • 1
    TransactionScope does a bunch of DTC nonsense you generally do not care about, would avoid it unless I needed that feature, it is quite easy to roll your own context attached to thread local storage – Sam Saffron May 21 '12 at 23:36
  • Just converted another method I had that took 21 minutes to do 425k inserts, now it does it in 55 seconds. <3 – kenwarner May 22 '12 at 00:23
  • @qntmfred 12500/s isn't a bad effort ;p Since you've done all the work, I would suggest you self-answer this - it seems like using the explicit connection-based transaction is the trick for optimum performance. I am actually surprised it out-performs *not* using a transaction, but: evidence is evidence, data is data. – Marc Gravell May 22 '12 at 06:22
  • 1
    @MarcGravell I think he is using a transaction, just not using transactionscope that does all the dtc nonsense – Sam Saffron May 23 '12 at 12:50
  • 2
    @MarcGravell note, even SqlTransaction has some additional overhead over "begin tran" – Sam Saffron May 23 '12 at 12:51
  • hey guys, just wondering why is it slower when running it in VS Debugger? is this captured somewhere on the Dapper page? thanks – melaos Dec 05 '12 at 09:20
  • And what would the performance be if the connection was opened within the TransactionScope as per the MSDN example? – Richard Collette May 30 '13 at 04:34
  • @MarcGravell i have three properties in my object class and i want to insert into two properties in query. then what to do in that case? – Meer Aug 11 '16 at 11:40
  • @MarcGravell for example Member class is having three properties Username,IsActive and email . i am having list of member class objects and i want to insert only Username and IsActive – Meer Aug 11 '16 at 11:43
  • Original article link is broken - @kenwarner, is http://www.gamasutra.com/view/news/170502/Indepth_SQL_Server__High_performance_inserts.php a reprint? – pete the pagan-gerbil Oct 28 '16 at 07:18
  • Don't use TransactionScope when you can use a native database transaction! – Martin.Martinsson May 07 '19 at 18:22

6 Answers6

89

The best I was able to achieve was 50k records in 4 seconds using this approach

SqlTransaction trans = connection.BeginTransaction();

connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members, transaction: trans);

trans.Commit();
kenwarner
  • 28,650
  • 28
  • 130
  • 173
  • 1
    maybe the connection isnt enlisted in transaction and you got the problem before? – GorillaApe Oct 23 '12 at 13:35
  • 11
    @Parhs: that's a 50% improvement. – skolima Oct 01 '15 at 16:09
  • I am just curious, why you put your code under transaction? What's difference if we don't use transaction in this case? – cuongle Nov 11 '15 at 16:45
  • 1
    @skolima Although, you can phrase it "3 records per minute" to "6 records per minute" (or any equivelant amount), then it's a 100% improvement – Rob Dec 18 '15 at 01:06
  • @CuongLe I'd imagine he want's the bulk insert as a whole to be atomic. Otherwise, there isn't a need to have a transaction (a single insert, if it fails would throw an exception that would halt the execution) – Sudhanshu Mishra Jan 04 '16 at 22:56
  • 1
    Hii, I used the same code, I'm inserting 100000 records, but it's taking 62 sec. but you are saying 50000 records took only 4 sec. [code](http://i.stack.imgur.com/suVUV.png) . Can you please what's wrong I'm doing here. – Amit Kumar Sep 23 '16 at 05:09
  • Not sure if it's a change in Dapper or just how I use it but I had to use `IDbTransaction trans = con.BeginTransaction();`. I saw a 60 times performance increase for 200,000 entries that have a PK. – AndrewWhalan Nov 10 '16 at 03:03
  • @AmitKumar inserts depends hugely on how many indexes you have on your table, also the physical speed of your storage medium. – Erik Bergstedt Dec 14 '16 at 13:08
  • @ErikBergstedt: In my table, It does not have any other(`non-clustered`) index except only 1 `clustered` index on PK. So It should not take that much time. I still could not find any way to reduce time, so I'm asking. – Amit Kumar Dec 15 '16 at 05:41
  • @AmitKumar interesting, what type of harddrive do you use? does anyone else use this drive? – Erik Bergstedt Dec 15 '16 at 10:09
  • 3
    guys - I just looked at sql server profiler and I'm puzzled. this code doesn't do batching (at least for me) - it inserts one row per operation (meaning 1500 rows - 1500 separate insert commands). I use Sql Server 2012 with latest dapper – chester89 Jan 13 '17 at 10:03
  • 3
    @chester89 I agree with you. This isn't bulk inserting anything. It's just regular inserts. I don't think dapper can handle bulk inserts at this point – Joe Phillips Aug 14 '17 at 21:23
  • Tim Corey has a good video including bulk inserts (no Dapper can't do it): https://www.youtube.com/watch?v=eKkh5Xm0OlU&t=12s – niico Mar 05 '21 at 09:20
  • What about the case when you need rows' ids? – honzakuzel1989 Jan 12 '22 at 10:54
14

I stumbled accross this recently and noticed that the TransactionScope is created after the connection is opened (I assume this since Dappers Execute doesn't open the connection, unlike Query). According to the answer Q4 here: https://stackoverflow.com/a/2886326/455904 that will not result in the connection to be handled by the TransactionScope. My workmate did some quick tests, and opening the connection outside the TransactionScope drastically decreased performance.

So changing to the following should work:

// Assuming the connection isn't already open
using (var scope = new TransactionScope())
{
    connection.Open();
    connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);

    scope.Complete();
}
Community
  • 1
  • 1
Fredrik Ljung
  • 1,445
  • 13
  • 28
  • 6
    If you try to `ExecuteAsync`, this will throw an exception: "A TransactionScope must be disposed on the same thread that it was created". To avoid this: `using (var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))` – Josh Noe Sep 01 '16 at 18:11
6

I created an extension method that would allow you to do a bulk insert very quickly.

public static class DapperExtensions
{
    public static async Task BulkInsert<T>(
        this IDbConnection connection,
        string tableName,
        IReadOnlyCollection<T> items,
        Dictionary<string, Func<T, object>> dataFunc)
    {
        const int MaxBatchSize = 1000;
        const int MaxParameterSize = 2000;

        var batchSize = Math.Min((int)Math.Ceiling((double)MaxParameterSize / dataFunc.Keys.Count), MaxBatchSize);
        var numberOfBatches = (int)Math.Ceiling((double)items.Count / batchSize);
        var columnNames = dataFunc.Keys;
        var insertSql = $"INSERT INTO {tableName} ({string.Join(", ", columnNames.Select(e => $"[{e}]"))}) VALUES ";
        var sqlToExecute = new List<Tuple<string, DynamicParameters>>();

        for (var i = 0; i < numberOfBatches; i++)
        {
            var dataToInsert = items.Skip(i * batchSize)
                .Take(batchSize);
            var valueSql = GetQueries(dataToInsert, dataFunc);

            sqlToExecute.Add(Tuple.Create($"{insertSql}{string.Join(", ", valueSql.Item1)}", valueSql.Item2));
        }

        foreach (var sql in sqlToExecute)
        {
            await connection.ExecuteAsync(sql.Item1, sql.Item2, commandTimeout: int.MaxValue);
        }
    }

    private static Tuple<IEnumerable<string>, DynamicParameters> GetQueries<T>(
        IEnumerable<T> dataToInsert,
        Dictionary<string, Func<T, object>> dataFunc)
    {
        var parameters = new DynamicParameters();

        return Tuple.Create(
            dataToInsert.Select(e => $"({string.Join(", ", GenerateQueryAndParameters(e, parameters, dataFunc))})"),
            parameters);
    }

    private static IEnumerable<string> GenerateQueryAndParameters<T>(
        T entity,
        DynamicParameters parameters,
        Dictionary<string, Func<T, object>> dataFunc)
    {
        var paramTemplateFunc = new Func<Guid, string>(guid => $"@p{guid.ToString().Replace("-", "")}");
        var paramList = new List<string>();

        foreach (var key in dataFunc)
        {
            var paramName = paramTemplateFunc(Guid.NewGuid());
            parameters.Add(paramName, key.Value(entity));
            paramList.Add(paramName);
        }

        return paramList;
    }
}

Then to use this extension method, you would write code like the following:

await dbConnection.BulkInsert(
    "MySchemaName.MyTableName",
    myCollectionOfItems,
    new Dictionary<string, Func<MyObjectToInsert, object>>
        {
            { "ColumnOne", u => u.ColumnOne },
            { "ColumnTwo", u => u.ColumnTwo },
            ...
        });

This is quite primitive and has further room for improvement, such as passing in a transaction or a commandTimeout value but it does the trick for me.

CallumVass
  • 11,288
  • 26
  • 84
  • 154
0

I found all these examples incomplete.

Here is some code that properly closes the connection after use, and also correctly uses the transactionscope to enhance the Excecute performance, based on the more recent and better answers in this thread.

using (var scope = new TransactionScope()) 
{
    Connection.Open();
    Connection.Execute(sqlQuery, parameters);

    scope.Complete();
}
Erik Bergstedt
  • 912
  • 10
  • 27
  • 1
    The using keyword automatically calls IDispose which calls .Close(). The using keyword converts your code at compile time into a try/finally where .Dispose() is called in the finally. – Cubicle.Jockey Jun 03 '16 at 17:41
  • I thought IDispose only called Dispose? – Erik Bergstedt Jun 03 '16 at 18:54
  • 1
    You are correct. It calls Dispose but the connections implementation of Dispose calls Close(). I am also assume that's a SqlConnection object. But yeah the using helps so you don't forget to call closed as it's called in Dispose. – Cubicle.Jockey Jun 03 '16 at 18:57
0

Using the Execute method with only one insert statement will never do a bulk insert or be efficient. Even the accepted answer with a Transaction doesn't do a Bulk Insert.

If you want to perform a Bulk Insert, use the SqlBulkCopy https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy

You will not find anything faster than this.

Dapper Plus

Disclaimer: I'm the owner of the project Dapper Plus

This project is not free but offers all bulk operations:

  • BulkInsert
  • BulkUpdate
  • BulkDelete
  • BulkMerge

(Use under the hood SqlBulkCopy)

And some more options such as outputting identity values:

// CONFIGURE & MAP entity
DapperPlusManager.Entity<Order>()
                 .Table("Orders")
                 .Identity(x => x.ID);

// CHAIN & SAVE entity
connection.BulkInsert(orders)
          .AlsoInsert(order => order.Items);
          .Include(x => x.ThenMerge(order => order.Invoice)
                         .AlsoMerge(invoice => invoice.Items))
          .AlsoMerge(x => x.ShippingAddress);   

Our library supports multiple providers:

  • SQL Server
  • SQL Compact
  • Oracle
  • MySql
  • PostgreSQL
  • SQLite
  • Firebird
Jonathan Magnan
  • 10,874
  • 2
  • 38
  • 60
-3

the fastest variant for me:

            var dynamicParameters = new DynamicParameters();
            var selects = new List<string>();
            for (var i = 0; i < members.Length; i++)
            {
                var member = members[i];
                var pUsername = $"u{i}";
                var pIsActive = $"a{i}";
                dynamicParameters.Add(pUsername, member.Username);
                dynamicParameters.Add(pIsActive, member.IsActive);
                selects.Add("select @{pUsername},@{pIsActive}");
            }
            con.Execute($"insert into Member(Username, IsActive){string.Join(" union all ", selects)}", dynamicParameters);

which generate sql like:

INSERT TABLENAME (Column1,Column2,...)
 SELECT @u0,@a0...
 UNION ALL
 SELECT @u1,@a1...
 UNION ALL
 SELECT @u2,@a2...

this query works faster because sql adds set of rows instead adding 1 row at a time. The bottleneck is not writing the data, it's writing what you're doing in the log.

Also, look into the rules of minimally logged transactions.

razon
  • 3,882
  • 2
  • 33
  • 46