2

I have some code that at the end of the program's life, uploads the entire contents of 6 different lists into a database. The problem is, they're parallel lists with about 14,000 items in each, and I have to run an Insert query for each of their separate item(s). This takes a long time, is there a faster way to do this? Here's a sample of the relevant code:

    public void uploadContent()
    {
        var cs = Properties.Settings.Default.Database;
        SqlConnection dataConnection = new SqlConnection(cs);
        dataConnection.Open();

        for (int i = 0; i < urlList.Count; i++)
        {
            SqlCommand dataCommand = new SqlCommand(Properties.Settings.Default.CommandString, dataConnection);
            try
            {
                dataCommand.Parameters.AddWithValue("@user", userList[i]);
                dataCommand.Parameters.AddWithValue("@computer", computerList[i]);
                dataCommand.Parameters.AddWithValue("@date", timestampList[i]);
                dataCommand.Parameters.AddWithValue("@itemName", domainList[i]);
                dataCommand.Parameters.AddWithValue("@itemDetails", urlList[i]);
                dataCommand.Parameters.AddWithValue("@timesUsed", hitsList[i]);

                dataCommand.ExecuteNonQuery();
            }
            catch (Exception e)
            {
                using (StreamWriter sw = File.AppendText("errorLog.log"))
                {
                    sw.WriteLine(e);
                }
            }

        }
        dataConnection.Close();
    }

Here is the command string the code is pulling from the config file:

CommandString:

INSERT dbo.InternetUsage VALUES (@user, @computer, @date, @itemName, @itemDetails, @timesUsed)
Peter O.
  • 32,158
  • 14
  • 82
  • 96
Ryan Duffing
  • 664
  • 3
  • 11
  • 20
  • I've generated XML before now, uploaded to a sproc that takes the XML and uses XML queries on the server side to process the nodes into table rows. Means it's just one call and the server does all the processing of the thousands of rows. – Lloyd Oct 23 '12 at 13:21
  • 3
    Which Database are you using, generally you would want to run multiple inserts in one command. For example, please see this post: http://stackoverflow.com/questions/2624713/how-do-i-insert-multiple-rows-without-repeating-the-insert-into-dbo-blah-part – Simon Oct 23 '12 at 13:22
  • @Simon I'm going to give that a shot. I didn't think about doing that. Thanks. – Ryan Duffing Oct 23 '12 at 13:25
  • 1
    Please be aware that you will probably generating the sql command dynamically, and you need to ensure it doesnt suffer from sql injection. Other options may to look at utilities like sqlbulkcopy, and or passing table valued parameters to stored procedures, or even requesting an empty datatable, adding all rows, then committing changes after all the inserts. – Simon Oct 23 '12 at 13:27
  • 1
    Check this thread http://stackoverflow.com/questions/2972974/how-should-i-multiple-insert-multiple-records . i think is similiar to your question – ígor Oct 23 '12 at 13:28
  • 1
    @Simon Your method would have worked wonderfully, but it looks like I'll have to go with Bulk Insert as I have more than 1000 rows to insert. – Ryan Duffing Oct 23 '12 at 13:56

4 Answers4

2
  1. Make Command and parameter creation outside of for (int i = 0; i < urlList.Count; i++)
  2. Also Create insert within a transaction
  3. If it possible create a Stored Procedure and pass parameters as DataTable.
alerya
  • 3,125
  • 3
  • 28
  • 50
2

As mentioned in @alerya's answer, doing the following will help (added explanation here)

1) Make Command and parameter creation outside of for loop Since the same command is being used each time, it doesn't make sense to re-create the command each time. In addition to creating a new object (which takes time), the command must also be verified each time it is created for several things (table exists, etc). This introduces a lot of overhead.

2) Put the inserts within a transaction Putting all of the inserts within a transaction will speed things up because, by default, a command that is not within a transaction will be considered its own transaction. Therefore, every time you insert something, the database server must then verify that what it just inserted is actually saved (usually on a harddisk, which is limited by the speed of the disk). When multiple INSERTs are within one transactions, however, the check only needs to be performed once.

The downside to this approach, based on the code you've already shown, is that one bad INSERT will spoil the bunch. Whether or not this is acceptable depends on your specific requirements.

Aside Another thing you really should be doing (though this won't speed things up in the short term) is properly using the IDisposable interface. This means either calling .Dispose() on all IDisposable objects (SqlConnection, SqlCommand), or, ideally, wrapping them in using() blocks:

using( SqlConnection dataConnection = new SqlConnection(cs) 
{
    //Code goes here
}

This will prevent memory leaks from these spots, which will become a problem quickly if your loops get too large.

sybkar
  • 386
  • 1
  • 9
  • Thanks. I created the command outside of the loop, I then used Bulk Insert, so it's all within one transaction. Now I'm just waiting for our db admin to give me bulk load permissions on the database. – Ryan Duffing Oct 23 '12 at 14:17
  • @RyanDuffing, that's good. Just make sure you use the `using()` construct or, at a minimum, call `.dispose()` on each `IDisposable` object! – sybkar Oct 23 '12 at 14:28
1

Sending INSERT commands one by one to a database will really make the whole process slow, because of the round trips to the database server. If you're worried about performance, you should consider using a bulk insert strategy. You could:

  1. Generate a flat file with all your information, in the format that BULK INSERT understands.
  2. Use the BULK INSERT command to import that file to your database (http://msdn.microsoft.com/en-us/library/ms188365(v=sql.90).aspx).

Ps. I guess when you say SQL you're using MS SQL Server.

Fabio
  • 3,020
  • 4
  • 39
  • 62
-1

Why don't you run your uploadContent() method from a separate thread.
This way you don't need to worry about how much time the query takes to execute.

  • 1
    He's concerned about how long the inserts are taking. He didn't say it was stopping him from doing something else...he also specified that it's at the end of a program, so do it on another thread wouldn't really make a difference in terms of preventing other things from happening. – sybkar Oct 23 '12 at 14:15