TL;DR
Why does running multiple SqlBulkCopy inserts, against unrelated tables, async
& in parallel, on a single Transaction seem to behave as though it's running in series instead?
Context
I have some code that is calculating and storing a large volume of data. The calculation is done up-front, so the storage section of the code gets handed this big pile of data to be stored.
My DB writes are being done with SqlBulkCopy.WriteToServerAsync
which does the job nicely, in general.
Amongst the things I need to store are 6 tables which are business-related, but not SQL-related. As such, my write to them needs to be in a transaction, so that an error on any one write reverts the writes on all the others.
The performance of this code is fairly critical, so I want to be able to run the BulkInserts in parallel. There are no FKeys or any other tables being interacted with, (data integrity is managed by the code) so I don't see any reason that this shouldn't be possible.
What I've currently written
I thought I knew how to write all the code and have been able to get it all working, but there's a weird performance slow-down that I don't understand:
Happy to provide actual bits of code you want, but this is already a very long Q, and the code would be pretty long to0. LMK if you do want to see anything.
I can write:
"BulkInsert into each table sequentially, all in a single Transaction".
- i.e. I open a
new SqlConnection()
and.BeginTransaction()
, - then I
foreach
over the 6 tables, andawait InsertToTable(transaction)
each table before theforeach
moves to the next one. - When the
foreach
concludes then I.Commit()
the transaction and close the connection. - I have a large-volume test that runs this version in 184 seconds (95%, +/- 2.45s).
- i.e. I open a
"BulkInsert into each table sequentially, with a new connection & Transaction for each table."
- i.e. I
foreach
over the 6 tables, andawait InsertToTable()
each table before theforeach
moves to the next one. - Inside each
InsertToTable()
call I open a newSqlConnection
andBeginTransaction
, and then I.Commit()
and.Close()
before returning from the method. - I have a large-volume test that runs this version in 185 seconds (95%, +/- 3.34s).
- i.e. I
"BulkInsert into each table in parallel, with a new connection & Transaction for each table."
- i.e. I initiate all 6 of my tasks by calling
thisTableTask = InsertToTable()
for each table, and capturing theTask
s but notawait
ing them (yet). - I
await Task.WhenAll()
the 6 tasks captured. - Inside each
InsertToTable()
call I open a newSqlConnection
andBeginTransaction
, and then I.Commit()
and.Close()
before returning from the method. (but note that the foreach has moved onto the next table, because it doesn'tawait
the Task immediately. - I have a large-volume test that runs this version in 144 seconds (95%, +/- 5.20s).
- i.e. I initiate all 6 of my tasks by calling
"BulkInsert into each table in parallel, all in a single Transaction".
- i.e. I open a
new SqlConnection()
and.BeginTransaction()
. - Then I initiate all 6 of my tasks by calling
thisTableTask = InsertToTable(transaction)
for each table, and capturing theTask
s but notawait
ing them (yet). - I
await Task.WhenAll()
the 6 tasks captured. - Once the
WhenAll
concludes then I.Commit()
the transaction and close the connection. - I have a large-volume test that runs this version in 179 seconds (95%, +/- 1.78s).
- i.e. I open a
In all cases the eventual BulkInsert looks like:
using (var sqlBulk = BuildSqlBulkCopy(tableName, columnNames, transactionToUse))
{
await sqlBulk.WriteToServerAsync(dataTable);
}
private SqlBulkCopy BuildSqlBulkCopy(string tableName, string[] columnNames, SqlTransaction transaction)
{
var bulkCopy = new SqlBulkCopy(transaction.Connection, SqlBulkCopyOptions.Default, transaction)
{
BatchSize = 10000,
DestinationTableName = tableName,
BulkCopyTimeout = 3600
};
foreach (var columnName in columnNames)
{
// Relies on setting up the data table with column names matching the database columns.
bulkCopy.ColumnMappings.Add(columnName, columnName);
}
return bulkCopy;
}
Current Performance stats
As listed above
- Sequential + single Tran = 184s
- Sequential + separate Trans = 185s
- Parallel + separate Tran = 144s
- Parallel + single Tran = 179s
Those first 3 results all make sense to me.
#1 vs #2: As long as the inserts all work, the Transactions don't do much. The DB is still doing all the same work, at the same points in time.
#2 vs #3: This was the entire point of running the inserts in parallel. By running the inserts in parallel, we spend less time waiting around for SQL to do it's thing. We're making the DB do a lot of work in parallel, so it's not as much as a 6-fold speed up, but it's still plenty.
QUESTION:
Why is the last case so slow? And can I fix it?
- Parallel + single Tran = 179
That's almost as slow as doing it in series, and fully 25% slower than doing it in parallel, but with multiple transactions!
What's going on?
Why does running multiple SqlBulkCopy inserts, against unrelated tables, async
& in parallel, on a single Transaction seem to behave as though it's running in series instead?
Non-Dupes:
SqlBulkCopy Multiple Tables Insert under single Transaction OR Bulk Insert Operation between Entity Framework and Classic Ado.net (Isn't running the queries in parallel)
Using SqlBulkCopy in one transaction for multiple, related tables (Tables are related and they're trying to read back out of them)
Parallel Bulk Inserting with SqlBulkCopy and Azure (that's talking about parallel load into a single table)