2

Background:

I have a service which queries the db for historical results in batches. The batches are based on a start time and an end time. The data between all the batches is mutually exclusive, so for a given set of batches, they can be run in any order. Some batches can take more time to execute than others. If any batch fails then the entire process is aborted/stopped and the error is logged. When returning back data to the client, the data from all the batches need to be combined.

Problem:

Return the data to the client as fast as possible.

Initial solutions:

Initially I was executing the batches in order synchronously. This was not taking advantage of the fact that the data is mutually exclusive. After initial use, it was found that this load method was taking too long. After some debugging, I found the main cause for slowness was the execution time of the sql queries. So the next solution was to try to execute each batch asynchronously using BeginExecuteReader() and EndExecuteReader(). After calling all batches asynchronously, the service would wait in a while loop and poll every 300ms to check if any of the queries had completed. If it was, then it would be read.

int batchCount = 0, resultCount = 0;
List<AsyncDataResult> asyncCalls = new List<AsyncDataResult>();

while (true)
{
    if (asyncCalls.Count == 0)
    {
        break;
    }

    if ((asyncCalls[resultCount]).AsyncResult.IsCompleted)
    {
        ReadAsyncResultsFromDb(asyncCalls[resultCount]);
        asyncCalls.RemoveAt(resultCount);
    }

    resultCount++;
    if (resultCount >= asyncCalls.Count)
    {
        resultCount = 0;
        Thread.Sleep(300);
    }
}

The above approach decreased the loading time for large data sets, but for very small data sets (but across many batches), the poll is actually adding to the loading delay.

Question:

  1. How do I execute the sql asynchronously, but not do polling?
  2. Start reading each batch as soon as it completes?

Update: Sorry about this, but I forgot to add the part where I need to return in the same method that is calling the asynchronous calls. The reason for this is that the dataset that I need to populate is passed in as a parameter to this method. Using the IAsyncCallback from the begin reader would require me to change the entire class. I was hoping that I wouldn't have to do that.

Ryan Gates
  • 4,501
  • 6
  • 50
  • 90
EndlessSpace
  • 1,320
  • 2
  • 24
  • 46

3 Answers3

2

Why poll actively? Each async operation you spawn hands back an IAsyncResult that has a WaitHandle. Use WaitAny() and let the system notify you:

/// <summary>
/// Do something useful with a completed query
/// </summary>
/// <param name="result"></param>
/// <returns>
///   true if the query failed;
///   false if the query was successful
/// </returns>
private static bool DoSomethingUseful( IAsyncResult result )
{
    throw new NotImplementedException() ;
}

static void Main( string[] args )
{
    List<IAsyncResult> batch         = SpawnBatch() ;
    bool               errorOccurred = ProcessCompletedBatches( batch , DoSomethingUseful) ;

    if ( errorOccurred )
    {
        CancelPending( batch ) ;
    }

    return ;

}

public static bool ProcessCompletedBatches( List<IAsyncResult> pending , Func<IAsyncResult,bool> resultHandler )
{
    bool errorOccurred = false ;

    while ( ! errorOccurred && pending.Count > 0 )
    {
        WaitHandle[] inFlight = pending.Select( x => x.AsyncWaitHandle ).ToArray() ;

        int offset = WaitHandle.WaitAny( inFlight ) ;

        IAsyncResult result = pending[offset] ;
        pending.RemoveAt(offset) ;

        errorOccurred = resultHandler(result) ;

    }

    return errorOccurred ;

}
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
1

Not enough info to suggest a way you should go, but defintely one you might, and that would be Task Parallel library and Task<T>.

Loads of fun. However, don't go mad with it, you could easily end up with your super duper multithreaded effort being slower than your synchronous batch.

If say T, was connect to a db, throw a query at it, return a datareader, and say there were 8 queries in a batch.

You set up the tasks 0 to 7, start a timer for timeout, kick them all off. As they complete you keep the readerr and set a bit of your flag based on the task id. When it gets to 255 raise an OnBatchComplete event, copy out your readers and pass them to the combine task. Timeout goes off first, act accordingly. If there's an error in task, have it return some suitable reason, bubble out to the caller, possibly killing any still running queries.

Don't know how your combine process works either, but if it could be organised so say once queries 1 & 3 are ready, you could do an intermediate process, or if it's in some logical order, once all the queries are ready to read, could you start reading into to simple classes and then throw each one at a combine task....

It's not fair I don't get fun stuff like this....

Tony Hopkinson
  • 20,172
  • 3
  • 31
  • 39
0

Rather than polling you need to be using an async callback. http://msdn.microsoft.com/en-us/library/system.asynccallback.aspx

What is AsyncCallback?

Community
  • 1
  • 1
emd
  • 1,173
  • 9
  • 21
  • Except that handing results back to the client likely has to be single-threaded. Your callback handler will need to serialize so you don't have two competing callbacks trying to hand data back to the client. You might as well use `WaitHandle.WaitAny()`. – Nicholas Carey Feb 10 '12 at 21:39