1

In a web application, we provide paginated search panels for various database tables in our application. We currently allow users to select individual rows, and via a UI, execute some operation in each selected instance.

For example, a panel of document records offers an ability to delete documents. A user may check 15 checkboxes representing 15 document identifiers, and choose Options > Delete. This works just fine.

I wish to offer the users an option to execute some operation for all rows matching the query used to display the data in the panel.

We may have 5,000 documents matching some search criteria, and wish to allow a user to delete all 5,000. (I understand this example is a bit contrived; let's ignore the 'wisdom' to allowing users to delete documents in bulk!)

Execution of a method for thousands of rows is a long-running operation, so I will queue the operation instead. Consider this an equivalent of Gmail's ability to apply a filter to all email conversations matching some search criteria.

I need to execute a query that will return an unknown number of rows, and for each row, insert a row into a queue (in the code below, the queue is represented by ImportFileQueue).

I coded it as follows:

using (var reader = await source.InvokeDataReaderAsync(operation, parameters))
{
    Parallel.ForEach<IDictionary<string, object>>(reader.Enumerate(), async properties =>
    {
        try
        {
            var instance = new ImportFileQueueObject(User)
            {
                // application tier calculation here; cannot do in SQL
            };
            await instance.SaveAsync();
        }
        catch (System.Exception ex)
        {
            // omitted for brevity
        }
    });
}

When running this in a unit test that wraps the call with a Transaction, I receive a System.Data.SqlClient.SqlException: Transaction context in use by another session. error.

This is easily resolved by either:

  • Change the database call from async to sync, or
  • Removing the Parallel.Foreach, and iterating through the reader in a serial manner.

I opted for this former:

using (var reader = await source.InvokeDataReaderAsync(operation, parameters))
{
    Parallel.ForEach<IDictionary<string, object>>(reader.Enumerate(), properties =>
    {
        try
        {
            var instance = new ImportFileQueueObject(User)
            {
                // Omitted for brevity
            };
            instance.Save();
        }
        catch (System.Exception ex)
        {
            // omitted for brevity
        }
    });
}

My thought process is, in typical use cases:

  • the outer reader will often have thousands of rows
  • the instance.Save() call is "lightweight"; inserting a single row into the db

Two questions:

  1. Is there a reasonable way to use async/await inside the Parallel.Foreach, where the inner code is using SqlConnection (avoiding the TransactionContext error)
  2. If not, given my expected typical use case, is my choice to leverage TPL and forfeit async/await for the single-row saves reasonable

The answer suggested in What is the reason of “Transaction context in use by another session” says:

Avoid multi-threaded data operations if it's possible (no matter loading or saving). E.g. save SELECT/UPDATE/ etc... requests in a single queue and serve them with a single-thread worker;

but I'm trying to minimize total execution time, and figured the Parallel.Foreach was more likely to reduce execution time.

Eric Patrick
  • 2,097
  • 2
  • 20
  • 31
  • You are using the same SqlConnection from multiple threads? – Evk Jun 09 '17 at 12:43
  • `Parallel.ForEach` is not `async` aware and is not intended for use with `async` operations, e.g. database access. To spin off multiple asynchronous operations at once you can use `Select` with an asynchronous `Func` – JSteward Jun 09 '17 at 12:44
  • 1
    Why do you want to use `Parallel` at all? Modifying N rows in a batch is typically N times faster than trying to modify them individually. If your updates are slow, fix your data access method. Use batching to send *one* set of commands instead of mulitple individual commands, for example. – Panagiotis Kanavos Jun 09 '17 at 12:52
  • Better yet, if you read and write to the same database, don't go through the client. Write an `UPDATE FROM` statement to modify the data on the server side. If you want to *insert* a lot of data, use SqlBulkCopy to insert all of them with minimal logging – Panagiotis Kanavos Jun 09 '17 at 12:54
  • 1
    @EricPatrick What is your *actual* problem? Why are you trying to insert records in parallel? – Panagiotis Kanavos Jun 09 '17 at 12:56
  • @PanagiotisKanavos, I'm trying to balance minimum run time (TPL) with maximum concurrency (`async/await` on db access). I was surprised I could not do both. I'm inserting records in parallel to minimize run time. I glossed over the detail that prevents me from a pure SQL solution to this; I will edit the post shortly annotating such. The right answer may well be to reuse a single `SqlConnection` inside the `Parallel.Foreach`, but that would require a fair amount of re-architecture of `instance.Save()`. – Eric Patrick Jun 09 '17 at 14:08
  • 1
    @EricPatrick why do you want *concurrency* when inserting? The fastest way to insert a lot of data is to use bulk loading, and sent the data to the database as a stream, as fast as the network and disk can handle. You can do this on the client side as well, using SqlBulkCopy. *Concurrency* means locking and contention, which is why it can lead to *lower* throughput. After all, you have one network card and writing to the same storage. Concurrent connections content for the same network, disk, CPU resources – Panagiotis Kanavos Jun 09 '17 at 14:14
  • 1
    @EricPatrick Bulk loading is *not* the same as sending a batch of commands either. The database uses minimal logging, ie it doesn't log every single INSERT command. It copies the modified data pages to the log, resulting in fewer IO operations. – Panagiotis Kanavos Jun 09 '17 at 14:17
  • 1
    @EricPatrick the only way that parallel loading is going to help, is if it doesn't lead to contention, or at least, results in minimal contention. On the server side, that means writing to different tables, or different partitions of the same table. Each operation is going to lock only one partition. – Panagiotis Kanavos Jun 09 '17 at 14:20

1 Answers1

1

It's almost always a bad idea to open a transaction and then wait for I/O while holding it open. You'll get much better performance (and fewer deadlocks) by buffering the data first. If there's more total data than you can easily buffer in memory, buffer it into chunks of a thousand or so rows at a time. Put each of those in a separate transaction if possible.

Whenever you open a transaction, any locks taken remain open until it is committed (and locks get taken whether you want to or not when you're inserting data). Those locks cause other updates or reads without WITH(NOLOCK) to sit and wait until the transaction is committed. In a high-performance system, if you're doing I/O while those locks are held, it's pretty-much guaranteed to cause problems as other callers start an operation and then sit and wait while this operation does I/O outside the transaction.

James
  • 3,551
  • 1
  • 28
  • 38
  • The transaction in this case is from a test suite, and is sugar to roll back the inserted data after the test is complete. I *do* have the option of not wrapping the whole thing in a transaction in PROD/real-world environments. Nonetheless, I want to *understand the impact* of mixing Parallel.Foreach with async/await. – Eric Patrick Jun 09 '17 at 14:29
  • The problem is you're sharing physical resources like connection among threads. It's better approach to open one connection per threads, so they will be independent. – VMAtm Jun 09 '17 at 15:20