7

I am working on an existing application. This application reads data from a huge file and then, after doing some calculations, it stores the data in another table.

But the loop doing this (see below) is taking a really long time. Since the file sometimes contains 1,000s of records, the entire process takes days.

Can I replace this foreach loop with something else? I tried using Parallel.ForEach and it did help. I am new to this, so will appreciate your help.

foreach (record someredord Somereport.r)
{
    try
    {
        using (var command = new SqlCommand("[procname]", sqlConn))
        {
            command.CommandTimeout = 0;
            command.CommandType = CommandType.StoredProcedure;
            command.Parameters.Add(…);

            IAsyncResult result = command.BeginExecuteReader();
            while (!result.IsCompleted)
            {
                System.Threading.Thread.Sleep(10);
            }
            command.EndExecuteReader(result);
        }
    }
    catch (Exception e)
    {
        …
    }
}

After reviewing the answers , I removed the Async and used edited the code as below. But this did not improve performance.

using (command = new SqlCommand("[sp]", sqlConn))
{
    command.CommandTimeout = 0;
    command.CommandType = CommandType.StoredProcedure;
    foreach (record someRecord in someReport.)
    {
        command.Parameters.Clear();
        command.Parameters.Add(....)
        command.Prepare();                            

        using (dr = command.ExecuteReader())
        {
            while (dr.Read())
            {
                if ()
                {

                }
                else if ()
                {

                }
            }
        }                             
    }                        
}
Omar
  • 16,329
  • 10
  • 48
  • 66
user1110790
  • 787
  • 2
  • 8
  • 27
  • 7
    Two thoughts - first, you're doing async wrong and as a result you are likley sleeping for MANY items in the loop. Second, Can you reuse the SqlCommand object for the entire loop instead of creating/destroying one each time? – n8wrl Aug 30 '12 at 17:20
  • 3
    If you tell us more about what it is that you're trying to accomplish, we could potentially show you a solution in SQL that runs several orders of magnitude faster, and avoids this whole async/parallel business altogether. – Robert Harvey Aug 30 '12 at 17:25
  • 1
    @user1110790: The code you posted was full of errors (and still has at least one), so I've cleaned it up a bit. May I humbly suggest that when you post on SO, make sure your code is OK; otherwise you might just get lots of comments focusing on that, instead of on the actual issue. – stakx - no longer contributing Aug 30 '12 at 17:29
  • **Off-topic:** While I agree with others here that you are using the async methods in the wrong way, let me add that you should never poll an `AsyncResult` like that: `while (!result.IsCompleted) Thread.Sleep(…);`. Instead, you should do this: `result.AsyncWaitHandle.WaitOne();` While this will also block the calling thread, it doesn't need any polling; the OS will wake up the calling thread upon completion. – stakx - no longer contributing Aug 30 '12 at 17:40
  • @RobertHarvey , so this is actually part of service which reads a file containing account information and their online usage. Based on the data, we are calculating the total usage and adding that information to 3 tables using the stored procedure. The stored procedure simply updates an existing record or inserts a new record.The tables are small and only contain 5 columns. – user1110790 Aug 30 '12 at 18:59
  • @RobertHarvey The problem is the files are usually huge, containing almost 36000 records on an average. Based on what I have seen, the time to process each record increases as the number of records are increasing. I would really appreciate if we can figure out another way of doing this. – user1110790 Aug 30 '12 at 19:00
  • Maybe try to use some profiler. It may give you a hint what is going on. – Biggles Sep 03 '12 at 14:04

6 Answers6

8

Instead of looping the sql connection so many times, ever consider extracting the whole set of data out from sql server and process the data via the dataset?

Edit: Decided to further explain what i meant.. You can do the following, pseudo code as follow

  1. Use a select * and get all information from the database and store them into a list of the class or dictionary.
  2. Do your foreach(record someRecord in someReport) and do the condition matching as usual.
Guo Hong Lim
  • 1,690
  • 3
  • 13
  • 20
  • 5
    +1. But it'd probably be better to load the data into a strongly typed collection and then use Linq on that, rather than using a DataSet. – Steve Wortham Aug 30 '12 at 17:38
  • I tried using a datase , but for some reason it futher slowed down the process. We are also logging every single operation . DO you think I can do the logging on a separate thread improve performance? – user1110790 Aug 30 '12 at 18:21
  • @user1110790 - From my experience, datasets are generally slow to work with. That's why I suggested a strongly typed collection. Simply working with an IEnumerable collection in memory is going to be very fast. And if you're doing a lot of key lookups it could be made even faster with a `Dictionary`. – Steve Wortham Aug 30 '12 at 19:49
  • @SteveWortham this will probably require changes in the base class. Can you send me an example ? It looks like this problem does not have an simple solution . Thanks so much – user1110790 Aug 30 '12 at 20:21
  • @user1110790 - I just posted an answer describing how I would do it... http://stackoverflow.com/a/12205953/102896 – Steve Wortham Aug 30 '12 at 21:43
  • Edited my answer to explain more on how you can achieve what i suggested. – Guo Hong Lim Aug 31 '12 at 05:04
  • @GuoHongLim the current system reads data from the flat file and based on data calculates total values etc. now these values are being stored in a list. record someRecord in someReport is referrring to the list. – user1110790 Sep 04 '12 at 03:25
6

Step 1: Ditch the try at async. It isn't implemented properly and you're blocking anyway. So just execute the procedure and see if that helps.

Step 2: Move the SqlCommand outside of the loop and reuse it for each iteration. that way you don't incurr the cost of creating and destroying it for every item in your loop.

Warning: Make sure you reset/clear/remove parameters you don't need from the previous iteration. We did something like this with optional parameters and had 'bleed-thru' from the previous iteration because we didn't clean up parameters we didn't need!

Omar
  • 16,329
  • 10
  • 48
  • 66
n8wrl
  • 19,439
  • 4
  • 63
  • 103
3

Your biggest problem is that you're looping over this:

IAsyncResult result = command.BeginExecuteReader();

while (!result.IsCompleted)
{
   System.Threading.Thread.Sleep(10);
}

command.EndExecuteReader(result);

The entire idea of the asynchronous model is that the calling thread (the one doing this loop) should be spinning up ALL of the asynchronous tasks using the Begin method before starting to work with the results with the End method. If you are using Thread.Sleep() within your main calling thread to wait for an asynchronous operation to complete (as you are here), you're doing it wrong, and what ends up happening is that each command, one at a time, is being called and then waited for before the next one starts.

Instead, try something like this:

public void BeginExecutingCommands(Report someReport)
{
    foreach (record someRecord in someReport.r) 
    {
        var command = new SqlCommand("[procname]", sqlConn);

        command.CommandTimeout = 0;
        command.CommandType = CommandType.StoredProcedure;
        command.Parameters.Add(…);

        command.BeginExecuteReader(ReaderExecuted, 
            new object[] { command, someReport, someRecord });                   
    }
}

void ReaderExecuted(IAsyncResult result)
{
    var state = (object[])result.AsyncState;
    var command = state[0] as SqlCommand;
    var someReport = state[1] as Report;
    var someRecord = state[2] as Record;

    try
    {
        using (SqlDataReader reader = command.EndExecuteReader(result))
        {
            // work with reader, command, someReport and someRecord to do what you need.
        }
    }
    catch (Exception ex)
    {
        // handle exceptions that occurred during the async operation here
    }
}
stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
KeithS
  • 70,210
  • 21
  • 112
  • 164
  • I've removed the `public` accessibility modifier from the callback method (`ReaderExecuted`). Those should not be public, since they are not complete operations, but only the logical "remainder" of another method. – stakx - no longer contributing Aug 30 '12 at 17:56
  • +1 for demonstrating the proper use of the `Begin…`/`End…` async pattern. However, I am not 100% sure that this will fix the main issue. I am also not certain how well the thread pool and the DB will deal with possibly 1,000s of almost simultaneous requests...? – stakx - no longer contributing Aug 30 '12 at 18:02
  • 1
    The thread pool will schedule a certain number of them as fast as they come in, up to a "minimum" threshold. Then, it will spin up 4 threads per second while the number of threads is above that threshold, until it reaches a maximum threshold, at which point it will hold any new requests. The minimum and maximum thresholds are configurable. – KeithS Aug 30 '12 at 18:04
  • 2
    As for the DB, I've seen MSSql get pounded on pretty hard. Our main DB server has about 24 EUs and handles pretty much our entire business except for reporting (we ship transaction logs to a reporting DB to avoid locking issues). I know for a fact that at least one non-locking status query is executed against our main DB an average of 100 times *per second*. No problems. – KeithS Aug 30 '12 at 18:06
  • Thanks for answering my two concerns, I didn't know that. Good to know! – stakx - no longer contributing Aug 30 '12 at 18:08
1

In SQL on the other end of a write is a (one) disk. You rarely can write faster in parallel. In fact in parallel often slows it down due to index fragmentation. If you can sort the data by primary (clustered) key prior to loading. In a big load even disable other keys, load data rebuild keys.

Not really sure what are doing in the asynch but for sure it was not doing what you expected as it was waiting on itself.

try
{
    using (var command = new SqlCommand("[procname]", sqlConn))
    {
        command.CommandTimeout = 0;
        command.CommandType = CommandType.StoredProcedure;

        foreach (record someredord Somereport.r)
        {
            command.Parameters.Clear()
            command.Parameters.Add(…);

            using (var rdr = command.ExecuteReader())
            {
                while (rdr.Read())
                {
                    …
                }
            }
        }
    }
}
catch (…)
{
    …
}
stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • @stakx Will a rdr.Close() take care of that? – paparazzo Aug 30 '12 at 17:46
  • IIRC `rdr.Close()` would have the same effect as `rdr.Dispose()`... but a `using` block is easier than wrapping `rdr.Close()` in a `finally` clause (which you'd have to do for exception safety). – stakx - no longer contributing Aug 30 '12 at 17:48
  • OK but at the overhead of creating a new rdr each loop. What exception would not be caught on the Catch if rdr was reused? – paparazzo Aug 30 '12 at 17:51
  • You cannot actually re-use them across several queries. For all practical purposes, you should assume that each call to `SqlCommand.ExecuteReader()` will return a fresh instance of `SqlDataReader`. Just because you reference all these different instances with the same *variable* does not re-use them in any way. – stakx - no longer contributing Aug 30 '12 at 18:04
  • 1
    @stakx I have to say I did not believe you but I ran some tests and they don't dispute what you said and for readability I like the using syntax. – paparazzo Aug 31 '12 at 19:37
1

As we were talking about in the comments, storing this data in memory and working with it there may be a more efficient approach.

So one easy way to do that is to start with Entity Framework. Entity Framework will automatically generate the classes for you based on your database schema. Then you can import a stored procedure which holds your SELECT statement. The reason I suggest importing a stored proc into EF is that this approach is generally more efficient than doing your queries in LINQ against EF.

Then run the stored proc and store the data in a List like this...

var data = db.MyStoredProc().ToList();

Then you can do anything you want with that data. Or as I mentioned, if you're doing a lot of lookups on primary keys then use ToDictionary() something like this...

var data = db.MyStoredProc().ToDictionary(k => k.MyPrimaryKey);

Either way, you'll be working with your data in memory at this point.

Steve Wortham
  • 21,740
  • 5
  • 68
  • 90
0

It seems executing your SQL command puts lock on some required resources and that's the reason enforced you to use Async methods (my guess).

If the database in not in use, try an exclusive access to it. Even then in there are some internal transactions due to data-model complexity consider consulting to database designer.

Xaqron
  • 29,931
  • 42
  • 140
  • 205