2

Have a job which does the following 2 tasks

  1. Read up to 300 unique customerId from database into a List.
  2. Then Call a stored procedure for each customerId, which execute queries in the SP, creates a XML (up to of 10 KB) and store the XML into a database table.

So, in this case there should be 300 records in the table.

On an average, the SP takes around 3 secs to process each customer until it's xml creation. So that means, it's taking total 15 minutes to completely process all 300 customers. The problem is, in future it may be even more time-consuming.

I don't want to go with bulk-insert option by having logic of xml creation in application. Using bulk-insert, I won't be able to know which customerId's data was a problem if xml creation failed. So I want to call the SP for each customer.

To process all customer in parallel, I created 4 dedicated threads, each processing a collection of unique customerId, all 4 threads together processed all 300 customers in 5 minutes. Which I was expecting.

However I want to use ThreadPool rather than creating my own threads.

I want to have 2 type of threads here. One type of is process and create xml for each customer, another to work on the customers for XML is already created. This another thread will call a SP which would update a flag on a customer table based on customer's XML available.

So which is the best way to process 300 customers in parallel and quickly and also updating customer table in parallel or on a separate thread?

Is dedicated thread still good option here or Parallel.ForEach or await Task.WhenAll?

I know Parallel.Foreach will block the main thread, which I want to use for updating customer table.

James Z
  • 12,209
  • 10
  • 24
  • 44
PowerTech
  • 230
  • 3
  • 14

2 Answers2

4

You have to choose among several options for implementation. First of all, choose the schema you are using. You can implement your algorithm in co-routine fashion, whenever the thread needs some long-prepairing data, it yields the execution with await construction.

// It can be run inside the `async void` event handler from your UI.
// As it is async, the UI thread wouldn't be blocked
async Task SaveAll()
{
    for(int i = 0; i < 100; ++i)
    {
         // somehow get a started task for saving the (i) customer on this thread
        await SaveAsync(i);
    }
}

// This method is our first coroutine, which firstly starts fetching the data
// and after that saves the result in database
async Task SaveAsync(int customerId)
{
    // at this point we yield the work to some other method to be run
    // as at this moment we do not do anything
    var customerData = await FetchCustomer(customerId);
    // at this moment we start to saving the data asynchroniously
    // and yield the execution another time
    var result = await SaveCustomer(customerData);
    // at this line we can update the UI with result
}

FetchCustomer and SaveCustomer can use the TPL (they can be replaced with anonymous methods, but I don't like this approach). Task.Run will execute the code inside the default ThreadPool, so UI thread wouldn't be blocked (more about this method in Stephen Cleary's blog):

async Task<CustomerData> FetchCustomer(int customerId)
{
    await Task.Run(() => DataRepository.GetCustomerById(customerId));
}

// ? here is a placeholder for your result type
async Task<?> SaveCustomer(CustomerData customer)
{
    await Task.Run(() => DataRepository.SaveCustomer(customer));
}

Also I suggest you to examine this articles from that blog:

Another option is to use the TPL Dataflow extension, very similar to the this answer:

Nesting await in Parallel foreach

I suggest you to examine the contents of linked post, and decide for yourself, which approach will you implement.

Community
  • 1
  • 1
VMAtm
  • 27,943
  • 17
  • 79
  • 125
  • Thanks for posting the solution, if i use this technique, how much total time it would take to process all 300 customers, if the sp takes 3 secs for each customer. ? – PowerTech Mar 12 '16 at 03:39
  • Well, I can't say for sure, that'll be depend on the number of cores, but, if all will be done right way, results must be similar to what you have for the thread creation. – VMAtm Mar 12 '16 at 21:13
  • thanks. Before i tried to implement this technique and i created a very simple console application, tried to implement in the same you said above. but the loop executing in sequential. I have asked the problem in the below post. one person is advising use Task.whenAll. how this is different that what you are saying here.http://stackoverflow.com/questions/35965717/loop-with-asyn-and-await?noredirect=1#comment59586523_35965717 – PowerTech Mar 13 '16 at 02:43
  • What do you mean by `sequential`? In code you've implemented you're calling the `Thread.Sleep`. This is an emulation for the calculations, not the waiting for a long running task. If this is your case, then you should use the `Task.WhenAll`, as it is a multi-threading, not the asynchronious code (as I understood, this works for you). On the highest level of your call stack, simply ise `await Task.WhenAll()` – VMAtm Mar 13 '16 at 09:16
  • Thanks VMAtm, I meant was to say, the loop is not executing asynchronously. As i said in my question (above), my long running task will be taking between 2 to 5 seconds. so i used thead.sleep(2000). even if i increased it by 5 seconds, still no improvement. I agree, await Task.WhenAll() will resolve the issue, however still want to understand your intention of writing the above solution and in what case, it would run asynchronously. I don't think it matters if I use thread.Sleep(5000) than actually calling the long running task ( which takes 5 secs max). – PowerTech Mar 13 '16 at 16:14
  • Yes, I understand that. I'll try to implement simple sample for you, but right now I can't get some time for it, sorry – VMAtm Mar 14 '16 at 07:27
  • Thanks VMAtm for your help! – PowerTech Mar 16 '16 at 03:32
0
  1. I would try to solve your task completely within SQL. This will greatly reduce server roundtrips.

  2. I think you can create a enumeration of tasks, each one dealing with one record, and then to call Task.WhenAll() to run them all.