3

TL;DR

Why does running multiple SqlBulkCopy inserts, against unrelated tables, async & in parallel, on a single Transaction seem to behave as though it's running in series instead?


Context

I have some code that is calculating and storing a large volume of data. The calculation is done up-front, so the storage section of the code gets handed this big pile of data to be stored.

My DB writes are being done with SqlBulkCopy.WriteToServerAsync which does the job nicely, in general.

Amongst the things I need to store are 6 tables which are business-related, but not SQL-related. As such, my write to them needs to be in a transaction, so that an error on any one write reverts the writes on all the others.

The performance of this code is fairly critical, so I want to be able to run the BulkInserts in parallel. There are no FKeys or any other tables being interacted with, (data integrity is managed by the code) so I don't see any reason that this shouldn't be possible.


What I've currently written

I thought I knew how to write all the code and have been able to get it all working, but there's a weird performance slow-down that I don't understand:

Happy to provide actual bits of code you want, but this is already a very long Q, and the code would be pretty long to0. LMK if you do want to see anything.

I can write:

  • "BulkInsert into each table sequentially, all in a single Transaction".

    • i.e. I open a new SqlConnection() and .BeginTransaction(),
    • then I foreach over the 6 tables, and await InsertToTable(transaction) each table before the foreach moves to the next one.
    • When the foreach concludes then I .Commit() the transaction and close the connection.
    • I have a large-volume test that runs this version in 184 seconds (95%, +/- 2.45s).
  • "BulkInsert into each table sequentially, with a new connection & Transaction for each table."

    • i.e. I foreach over the 6 tables, and await InsertToTable() each table before the foreach moves to the next one.
    • Inside each InsertToTable() call I open a new SqlConnection and BeginTransaction, and then I .Commit() and .Close() before returning from the method.
    • I have a large-volume test that runs this version in 185 seconds (95%, +/- 3.34s).
  • "BulkInsert into each table in parallel, with a new connection & Transaction for each table."

    • i.e. I initiate all 6 of my tasks by calling thisTableTask = InsertToTable() for each table, and capturing the Tasks but not awaiting them (yet).
    • I await Task.WhenAll() the 6 tasks captured.
    • Inside each InsertToTable() call I open a new SqlConnection and BeginTransaction, and then I .Commit() and .Close() before returning from the method. (but note that the foreach has moved onto the next table, because it doesn't await the Task immediately.
    • I have a large-volume test that runs this version in 144 seconds (95%, +/- 5.20s).
  • "BulkInsert into each table in parallel, all in a single Transaction".

    • i.e. I open a new SqlConnection() and .BeginTransaction().
    • Then I initiate all 6 of my tasks by calling thisTableTask = InsertToTable(transaction) for each table, and capturing the Tasks but not awaiting them (yet).
    • I await Task.WhenAll() the 6 tasks captured.
    • Once the WhenAll concludes then I .Commit() the transaction and close the connection.
    • I have a large-volume test that runs this version in 179 seconds (95%, +/- 1.78s).

In all cases the eventual BulkInsert looks like:

using (var sqlBulk = BuildSqlBulkCopy(tableName, columnNames, transactionToUse))
{
    await sqlBulk.WriteToServerAsync(dataTable);
}

private SqlBulkCopy BuildSqlBulkCopy(string tableName, string[] columnNames, SqlTransaction transaction)
{
    var bulkCopy = new SqlBulkCopy(transaction.Connection, SqlBulkCopyOptions.Default, transaction)
    {
        BatchSize = 10000,
        DestinationTableName = tableName,
        BulkCopyTimeout = 3600
    };

    foreach (var columnName in columnNames)
    {
        // Relies on setting up the data table with column names matching the database columns.
        bulkCopy.ColumnMappings.Add(columnName, columnName);
    }

    return bulkCopy;
}


Current Performance stats

As listed above

  • Sequential + single Tran = 184s
  • Sequential + separate Trans = 185s
  • Parallel + separate Tran = 144s
  • Parallel + single Tran = 179s

Those first 3 results all make sense to me.

#1 vs #2: As long as the inserts all work, the Transactions don't do much. The DB is still doing all the same work, at the same points in time.

#2 vs #3: This was the entire point of running the inserts in parallel. By running the inserts in parallel, we spend less time waiting around for SQL to do it's thing. We're making the DB do a lot of work in parallel, so it's not as much as a 6-fold speed up, but it's still plenty.


QUESTION:

Why is the last case so slow? And can I fix it?

  • Parallel + single Tran = 179

That's almost as slow as doing it in series, and fully 25% slower than doing it in parallel, but with multiple transactions!

What's going on? Why does running multiple SqlBulkCopy inserts, against unrelated tables, async & in parallel, on a single Transaction seem to behave as though it's running in series instead?


Non-Dupes:

SqlBulkCopy Multiple Tables Insert under single Transaction OR Bulk Insert Operation between Entity Framework and Classic Ado.net (Isn't running the queries in parallel)

Using SqlBulkCopy in one transaction for multiple, related tables (Tables are related and they're trying to read back out of them)

Parallel Bulk Inserting with SqlBulkCopy and Azure (that's talking about parallel load into a single table)

Brondahl
  • 7,402
  • 5
  • 45
  • 74
  • 1
    If I understand correctly for the last use case, you are using the same connection/transaction and therefore must be using MARS. With MARS, execution of each insert bulk is interleaved, not parallel. I think you need a distributed transaction with different connections for true parallel execution. – Dan Guzman Jul 18 '20 at 15:35
  • 1
    On a side note related to performance, make sure the data table column types exactly match those of the target table, including maximum column length for varchar/nvarchar/char/nchar columns. – Dan Guzman Jul 18 '20 at 15:40
  • @DanGuzman `I think you need a distributed transaction`. :sad-trombone: That's the precise thing I have to avoid: https://github.com/dotnet/runtime/issues/715 . – Brondahl Jul 18 '20 at 15:47
  • But thanks for offering this explanation of why I'm not getting the perf I hoped for. I'd say that counts as a fully fledged answer. Do you want to post it as such, and earn magical internet points? – Brondahl Jul 18 '20 at 15:48
  • @Brondahl, I'm not big fan of distributed transactions either but I suspect you'll get the best throughput despite the commit overhead overhead in this case. I suggest you [race your horses](https://ericlippert.com/2012/12/17/performance-rant/) to (dis)prove my theory. – Dan Guzman Jul 18 '20 at 15:51
  • @DanGuzman you misunderstand. :) I have no overwhelming objection to Distributed Transactions, especially since (as I understand it?) my broader use-case is more-or-less what they were designed for. My problem is that .NET Core has decided not to support them for *anyone* until they can support them for *everyone* (which is hard, because multi-platform). So I can't race these horses, because .NET Core has banned the favourite from racing. – Brondahl Jul 18 '20 at 16:16
  • You could use the full .NET Framework instead of .NET core, if you don't mind Windows ;-) – Dan Guzman Jul 18 '20 at 16:36

1 Answers1

6

The only way to execute multiple commands concurrently on the same SQL Server connection/transaction is using Multiple Active Result Sets (MARS). MARS is used in the parallel single transaction case because you're using the same connection/transaction for each parallel bulk copy.

MARS executes SELECT and insert bulk operations as interleaved, not parallel, so you'll get about the same performance as serial execution. You need a distributed transaction with different connections for true parallel execution within the same transaction scope.

Dan Guzman
  • 43,250
  • 3
  • 46
  • 71